cdxcore.npio#
Fast binary disk i/o for numpy arrays.
from cdxcore.npio import to_file, from_file, read_into
from cdxcore.subdir import SubSir
import numpy as np
array = (np.random.normal(size=(1000,3))*100.).astype(np.int32)
file = SubDir("!/test", create_directory=True).full_file_name("test")
to_file( file, array ) # write
test = from_file( file ) # read back
read_into( file, test ) # read into an existing array
When reading cdxcore.npio.from_file()
you can automatically validate the shape and dtype of
the data being read:
test = from_file( file, validate_dtype=np.int32, validate_shape=(1000,3) )
Continguous Arrays
By default functions in this module assume that data is laid out linearly in memory, also called “c-continguous”.
This allows writing a continuous block of data to disk, or reading it back. If an array is not “continguous”
by default, an exception will be raised unless an intermediary copy buffer size is set with cont_block_size_mb
:
array = np.zeros((4,4), dtype=np.int8)
x = array[:,1]
assert not x.data.contiguous # not continguous
to_file( file, x, cont_block_size_mb=100 )
Shared Memory
The binary format is compatible with cdxcore.npshm.read_shared_array()
which reads
a binary array into shared memory.
Import#
from cdxcore.npio import to_file, from_file, intofile
Documentation#
Module Attributes
The maximum block size in 64 and 32 bit linux. |
Functions
|
Read array from disk into a new |
|
Read shape and dtype from a numpy binary file by only reading the file header. |
|
Read a |
|
Read an array from disk into an existing |
|
Write a numpy arrray into a file using binary format. |
- cdxcore.npio.LINUX_MAX_FILE_BLOCK = 2147479552#
The maximum block size in 64 and 32 bit linux.
- cdxcore.npio.from_file(file, *, validate_dtype=None, validate_shape=None, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#
Read array from disk into a new
numpy.ndarray
.Use
cdxcore.npshm.read_shared_array()
to create a shared array instead.- Parameters:
- filestr | int
A file name to be passed to
open()
, or a file handle fromopen()
.- validate_dtypedtype | None, default
None
If not
None
, check that the returned array has the specified dtype.- validate_shapetuple | None, default
None
If not
None
, check that the array has the specified shape.- read_onlybool, default
False
Whether to clear the
writable
flag of the array after reading it from disk.- bufferingint, default
-1
Buffering strategy. Only used if
file
is a string andopen()
is called. Use0
to turn off buffering. The default,-1
, is the default.- cont_block_size_mbint | None, default
None
By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mb
mega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray
Returns newly created numpy array.
- Array
- Raises:
- EOF
EOFError
In case the function failed to read the whole file.
- I/O error
IOError
In case the function failed to match the desired
validate_dtype
orvalidate_shape
.- Not continguous
RuntimeError
Raised if
array
is not continguous andcont_block_size_mb
isNone
(its default).
- EOF
- cdxcore.npio.read_dtype_and_shape(file, buffering=-1)[source]#
Read shape and dtype from a numpy binary file by only reading the file header.
- Parameters:
- Returns:
- dtype, shapetuple, type
Shape and dtype.
- Raises:
- EOF
EOFError
In case the function failed to read the whole header block.
- EOF
- cdxcore.npio.read_from_file(file, target, *, read_only=False, buffering=-1, validate_dtype=None, validate_shape=None, cont_block_size_mb=None)[source]#
Read a
numpy.ndarray
from disk into an existing array or into a new array.See
cdxcore.npio.read_into()
andcdxcore.npio.from_file()
for more convenient interfaces for each use case.By default this function does not read into non-continguous arrays. Use
cont_block_size_mb
to enable an intermediary buffer to do so.- Parameters:
- filestr | int
A file name to be passed to
open()
, or a file handle fromopen()
.- targetnp.ndarray | Callable
Either an
numpy.ndarray
to write into, or a function which returns allocates an array for a given shape and dtype. It must have the signature:def create( shape : tuple, dtype : type ): return np.empty( shape, dtype )
- read_onlybool, default
False
Whether to clear the
writable
flag of the array after reading it from disk.- bufferingint, default
-1
Buffering strategy. Only used if
file
is a string andopen()
is called. Use0
to turn off buffering. The default,-1
, is the default.- validate_dtype: dtype | None, default ``None``
If not
None
, check that the returned array has the specified dtype.- validate_shape: tuple | None, default ``None``
If not
None
, check that the array has the specified shape.- cont_block_size_mbint | None, default
None
By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mb
mega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray
The array
- Array
- Raises:
- EOF
EOFError
In case the function failed to read the whole file.
- I/O error
IOError
In case the function failed to match the desired
validate_dtype
orvalidate_shape
, or if it does not match the geometry oftarget
if provided as a numpy array.- Not continguous
RuntimeError
Raised if
array
is not continguous andcont_block_size_mb
isNone
, its default.
- EOF
- cdxcore.npio.read_into(file, array, *, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#
Read an array from disk into an existing
numpy.ndarray
.The receiving array must have the same shape and dtype as the array on disk.
- Parameters:
- filestr | int
A file name to be passed to
open()
, or a file handle fromopen()
.- targetnp.ndarray
Target array to write into. This array must have the same shape and dtype as the source data.
- read_onlybool, default
False
Whether to clear the
writable
flag of the array after reading it from disk.- bufferingint, default
-1
Buffering strategy. Only used if
file
is a string andopen()
is called. Use0
to turn off buffering. The default,-1
, is the default.- cont_block_size_mbint | None, default
None
By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of
cont_block_size_mb
mega bytes to read into non-continguous arrays.
- Returns:
- Array
numpy.ndarray
Returns
target
with the data read from disk.
- Array
- Raises:
- EOF
EOFError
In case the function failed to read the whole file.
- I/O error
IOError
In case the function failed to match the desired
validate_dtype
orvalidate_shape
, or if it does not match the geometry oftarget
.- Not continguous
RuntimeError
Raised if
array
is not continguous andcont_block_size_mb
isNone
(its default).
- EOF
- cdxcore.npio.to_file(file, array, *, buffering=-1, cont_block_size_mb=None)[source]#
Write a numpy arrray into a file using binary format.
This function will work for unbuffered files exceeding 2GB which is the usual unbuffered
write()
limitation on Linux. This function will only work with the dtypes contained incdxcore.npio._DTYPE_TO_CODE
.By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use
cont_block_size_mb
to enable an intermediary buffer to do so.Shared Memory
Use
cdxcore.npshm.read_shared_array()
to read numpy arrays stored to disk withto_file
into shared memory.- Parameters:
- filestr | int
Filename or an open file handle from
open()
.- array
numpy.ndarray
The array. Objects of type
cdxcore.sharedarray.ndsharedarray
are identified asnumpy.ndarray
arrays.- bufferingint, default
-1
Buffering strategy. Only used if
file
is a string andopen()
is called. Use0
to turn off buffering. The default,-1
, is the default.- cont_block_size_mbint | None, default
None
By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use
cont_block_size_mb
to enable an intermediary buffer of sizecont_block_size_mb
to do so.
- Raises:
- I/O error
IOError
In case the function failed to write the whole file.
- Value error
ValueError
In case an array is passed whose dtype is not contained in
cdxcore.npio._DTYPE_TO_CODE
, which has more than 32k dimensions, or which has an indivudual dimension longer than 2bn lines.- Not continguous
RuntimeError
Raised if
array
is not continguous andcont_block_size_mb
isNone
, its default.
- I/O error