cdxcore.npio#
Fast binary disk i/o for numpy arrays.
from cdxcore.npio import to_file, from_file, read_into
from cdxcore.subdir import SubSir
import numpy as np
array = (np.random.normal(size=(1000,3))*100.).astype(np.int32)
file = SubDir("!/test", create_directory=True).full_file_name("test")
to_file( file, array ) # write
test = from_file( file ) # read back
read_into( file, test ) # read into an existing array
When reading cdxcore.npio.from_file() you can automatically validate the shape and dtype of
the data being read:
test = from_file( file, validate_dtype=np.int32, validate_shape=(1000,3) )
Continguous Arrays
By default functions in this module assume that data is laid out linearly in memory, also called “c-continguous”.
This allows writing a continuous block of data to disk, or reading it back. If an array is not “continguous”
by default, an exception will be raised unless an intermediary copy buffer size is set with cont_block_size_mb:
array = np.zeros((4,4), dtype=np.int8)
x = array[:,1]
assert not x.data.contiguous # not continguous
to_file( file, x, cont_block_size_mb=100 )
Shared Memory
The binary format is compatible with cdxcore.npshm.read_shared_array() which reads
a binary array into shared memory.
Import#
from cdxcore.npio import to_file, from_file, intofile
Documentation#
Module Attributes
The maximum block size in 64 and 32 bit linux. |
Functions
|
Read array from disk into a new |
|
Read shape and dtype from a numpy binary file by only reading the file header. |
|
Read a |
|
Read an array from disk into an existing |
|
Write a numpy arrray into a file using binary format. |
- cdxcore.npio.LINUX_MAX_FILE_BLOCK = 2147479552#
The maximum block size in 64 and 32 bit linux.
- cdxcore.npio.from_file(file, *, validate_dtype=None, validate_shape=None, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#
Read array from disk into a new
numpy.ndarray.Use
cdxcore.npshm.read_shared_array()to create a shared array instead.- Parameters:
- filestr | int
A file name to be passed to
open(), or a file handle fromopen().- validate_dtypedtype | None, default
None If not
None, check that the returned array has the specified dtype.- validate_shapetuple | None, default
None If not
None, check that the array has the specified shape.- read_onlybool, default
False Whether to clear the
writableflag of the array after reading it from disk.- bufferingint, default
-1 Buffering strategy. Only used if
fileis a string andopen()is called. Use0to turn off buffering. The default,-1, is the default.- cont_block_size_mbint | None, default
None By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mbmega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray Returns newly created numpy array.
- Array
- Raises:
- EOF
EOFError In case the function failed to read the whole file.
- I/O error
IOError In case the function failed to match the desired
validate_dtypeorvalidate_shape.- Not continguous
RuntimeError Raised if
arrayis not continguous andcont_block_size_mbisNone(its default).
- EOF
- cdxcore.npio.read_dtype_and_shape(file, buffering=-1)[source]#
Read shape and dtype from a numpy binary file by only reading the file header.
- Parameters:
- Returns:
- dtype, shapetuple, type
Shape and dtype.
- Raises:
- EOF
EOFError In case the function failed to read the whole header block.
- EOF
- cdxcore.npio.read_from_file(file, target, *, read_only=False, buffering=-1, validate_dtype=None, validate_shape=None, cont_block_size_mb=None)[source]#
Read a
numpy.ndarrayfrom disk into an existing array or into a new array.See
cdxcore.npio.read_into()andcdxcore.npio.from_file()for more convenient interfaces for each use case.By default this function does not read into non-continguous arrays. Use
cont_block_size_mbto enable an intermediary buffer to do so.- Parameters:
- filestr | int
A file name to be passed to
open(), or a file handle fromopen().- targetnp.ndarray | Callable
Either an
numpy.ndarrayto write into, or a function which returns allocates an array for a given shape and dtype. It must have the signature:def create( shape : tuple, dtype : type ): return np.empty( shape, dtype )
- read_onlybool, default
False Whether to clear the
writableflag of the array after reading it from disk.- bufferingint, default
-1 Buffering strategy. Only used if
fileis a string andopen()is called. Use0to turn off buffering. The default,-1, is the default.- validate_dtype: dtype | None, default ``None``
If not
None, check that the returned array has the specified dtype.- validate_shape: tuple | None, default ``None``
If not
None, check that the array has the specified shape.- cont_block_size_mbint | None, default
None By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mbmega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray The array
- Array
- Raises:
- EOF
EOFError In case the function failed to read the whole file.
- I/O error
IOError In case the function failed to match the desired
validate_dtypeorvalidate_shape, or if it does not match the geometry oftargetif provided as a numpy array.- Not continguous
RuntimeError Raised if
arrayis not continguous andcont_block_size_mbisNone, its default.
- EOF
- cdxcore.npio.read_into(file, array, *, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#
Read an array from disk into an existing
numpy.ndarray.The receiving array must have the same shape and dtype as the array on disk.
- Parameters:
- filestr | int
A file name to be passed to
open(), or a file handle fromopen().- targetnp.ndarray
Target array to write into. This array must have the same shape and dtype as the source data.
- read_onlybool, default
False Whether to clear the
writableflag of the array after reading it from disk.- bufferingint, default
-1 Buffering strategy. Only used if
fileis a string andopen()is called. Use0to turn off buffering. The default,-1, is the default.- cont_block_size_mbint | None, default
None By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mbmega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray Returns
targetwith the data read from disk.
- Array
- Raises:
- EOF
EOFError In case the function failed to read the whole file.
- I/O error
IOError In case the function failed to match the desired
validate_dtypeorvalidate_shape, or if it does not match the geometry oftarget.- Not continguous
RuntimeError Raised if
arrayis not continguous andcont_block_size_mbisNone(its default).
- EOF
- cdxcore.npio.to_file(file, array, *, buffering=-1, cont_block_size_mb=None)[source]#
Write a numpy arrray into a file using binary format.
This function will work for unbuffered files exceeding 2GB which is the usual unbuffered
write()limitation on Linux. This function will only work with the dtypes contained incdxcore.npio._DTYPE_TO_CODE.By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use
cont_block_size_mbto enable an intermediary buffer to do so.Shared Memory
Use
cdxcore.npshm.read_shared_array()to read numpy arrays stored to disk withto_fileinto shared memory.- Parameters:
- filestr | int
Filename or an open file handle from
open().- array
numpy.ndarray The array. Objects of type
cdxcore.sharedarray.ndsharedarrayare identified asnumpy.ndarrayarrays.- bufferingint, default
-1 Buffering strategy. Only used if
fileis a string andopen()is called. Use0to turn off buffering. The default,-1, is the default.- cont_block_size_mbint | None, default
None By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use
cont_block_size_mbto enable an intermediary buffer of sizecont_block_size_mbto do so.
- Raises:
- I/O error
IOError In case the function failed to write the whole file.
- Value error
ValueError In case an array is passed whose dtype is not contained in
cdxcore.npio._DTYPE_TO_CODE, which has more than 32k dimensions, or which has an indivudual dimension longer than 2bn lines.- Not continguous
RuntimeError Raised if
arrayis not continguous andcont_block_size_mbisNone, its default.
- I/O error