cdxcore.npio#

Fast binary disk i/o for numpy arrays.

from cdxcore.npio import to_file, from_file, read_into
from cdxcore.subdir import SubSir
import numpy as np

array = (np.random.normal(size=(1000,3))*100.).astype(np.int32)
file  = SubDir("!/test", create_directory=True).full_file_name("test")

to_file( file, array )    # write
test = from_file( file )  # read back
read_into( file, test )   # read into an existing array

When reading cdxcore.npio.from_file() you can automatically validate the shape and dtype of the data being read:

test = from_file( file, validate_dtype=np.int32, validate_shape=(1000,3) )

Continguous Arrays

By default functions in this module assume that data is laid out linearly in memory, also called “c-continguous”. This allows writing a continuous block of data to disk, or reading it back. If an array is not “continguous” by default, an exception will be raised unless an intermediary copy buffer size is set with cont_block_size_mb:

array = np.zeros((4,4), dtype=np.int8)
x = array[:,1]
assert not x.data.contiguous  # not continguous
to_file( file, x, cont_block_size_mb=100 )

Shared Memory

The binary format is compatible with cdxcore.npshm.read_shared_array() which reads a binary array into shared memory.

Import#

from cdxcore.npio import to_file, from_file, intofile

Documentation#

Module Attributes

LINUX_MAX_FILE_BLOCK

The maximum block size in 64 and 32 bit linux.

Functions

from_file(file, *[, validate_dtype, ...])

Read array from disk into a new numpy.ndarray.

read_dtype_and_shape(file[, buffering])

Read shape and dtype from a numpy binary file by only reading the file header.

read_from_file(file, target, *[, read_only, ...])

Read a numpy.ndarray from disk into an existing array or into a new array.

read_into(file, array, *[, read_only, ...])

Read an array from disk into an existing numpy.ndarray.

to_file(file, array, *[, buffering, ...])

Write a numpy arrray into a file using binary format.

cdxcore.npio.LINUX_MAX_FILE_BLOCK = 2147479552#

The maximum block size in 64 and 32 bit linux.

cdxcore.npio.from_file(file, *, validate_dtype=None, validate_shape=None, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#

Read array from disk into a new numpy.ndarray.

Use cdxcore.npshm.read_shared_array() to create a shared array instead.

Parameters:
filestr | int

A file name to be passed to open(), or a file handle from open().

validate_dtypedtype | None, default None

If not None, check that the returned array has the specified dtype.

validate_shapetuple | None, default None

If not None, check that the array has the specified shape.

read_onlybool, default False

Whether to clear the writable flag of the array after reading it from disk.

bufferingint, default -1

Buffering strategy. Only used if file is a string and open() is called. Use 0 to turn off buffering. The default, -1, is the default.

cont_block_size_mbint | None, default None

By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of cont_block_size_mb mega bytes to read into non-continguous arrays.

Returns:
Arraynumpy.ndarray

Returns newly created numpy array.

Raises:
EOFEOFError

In case the function failed to read the whole file.

I/O errorIOError

In case the function failed to match the desired validate_dtype or validate_shape.

Not continguousRuntimeError

Raised if array is not continguous and cont_block_size_mb is None (its default).

cdxcore.npio.read_dtype_and_shape(file, buffering=-1)[source]#

Read shape and dtype from a numpy binary file by only reading the file header.

Parameters:
filestr | int

A file name to be passed to open(), or a file handle from open().

bufferingint, default -1

Buffering strategy. Only used if file is a string and open() is called. Use 0 to turn off buffering. The default, -1, is the default.

Returns:
dtype, shapetuple, type

Shape and dtype.

Raises:
EOFEOFError

In case the function failed to read the whole header block.

cdxcore.npio.read_from_file(file, target, *, read_only=False, buffering=-1, validate_dtype=None, validate_shape=None, cont_block_size_mb=None)[source]#

Read a numpy.ndarray from disk into an existing array or into a new array.

See cdxcore.npio.read_into() and cdxcore.npio.from_file() for more convenient interfaces for each use case.

By default this function does not read into non-continguous arrays. Use cont_block_size_mb to enable an intermediary buffer to do so.

Parameters:
filestr | int

A file name to be passed to open(), or a file handle from open().

targetnp.ndarray | Callable

Either an numpy.ndarray to write into, or a function which returns allocates an array for a given shape and dtype. It must have the signature:

def create( shape : tuple, dtype : type ):
    return np.empty( shape, dtype )
read_onlybool, default False

Whether to clear the writable flag of the array after reading it from disk.

bufferingint, default -1

Buffering strategy. Only used if file is a string and open() is called. Use 0 to turn off buffering. The default, -1, is the default.

validate_dtype: dtype | None, default ``None``

If not None, check that the returned array has the specified dtype.

validate_shape: tuple | None, default ``None``

If not None, check that the array has the specified shape.

cont_block_size_mbint | None, default None

By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of cont_block_size_mb mega bytes to read into non-continguous arrays.

Returns:
Arraynumpy.ndarray

The array

Raises:
EOFEOFError

In case the function failed to read the whole file.

I/O errorIOError

In case the function failed to match the desired validate_dtype or validate_shape, or if it does not match the geometry of target if provided as a numpy array.

Not continguousRuntimeError

Raised if array is not continguous and cont_block_size_mb is None, its default.

cdxcore.npio.read_into(file, array, *, read_only=False, buffering=-1, cont_block_size_mb=None)[source]#

Read an array from disk into an existing numpy.ndarray.

The receiving array must have the same shape and dtype as the array on disk.

Parameters:
filestr | int

A file name to be passed to open(), or a file handle from open().

targetnp.ndarray

Target array to write into. This array must have the same shape and dtype as the source data.

read_onlybool, default False

Whether to clear the writable flag of the array after reading it from disk.

bufferingint, default -1

Buffering strategy. Only used if file is a string and open() is called. Use 0 to turn off buffering. The default, -1, is the default.

cont_block_size_mbint | None, default None

By default this function does not read into arrays which are not c-continguous (linear in memory). Use this parameter to allocate an intermediary buffer of cont_block_size_mb mega bytes to read into non-continguous arrays.

Returns:
Arraynumpy.ndarray

Returns target with the data read from disk.

Raises:
EOFEOFError

In case the function failed to read the whole file.

I/O errorIOError

In case the function failed to match the desired validate_dtype or validate_shape, or if it does not match the geometry of target.

Not continguousRuntimeError

Raised if array is not continguous and cont_block_size_mb is None (its default).

cdxcore.npio.to_file(file, array, *, buffering=-1, cont_block_size_mb=None)[source]#

Write a numpy arrray into a file using binary format.

This function will work for unbuffered files exceeding 2GB which is the usual unbuffered write() limitation on Linux. This function will only work with the dtypes contained in cdxcore.npio._DTYPE_TO_CODE.

By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use cont_block_size_mb to enable an intermediary buffer to do so.

Shared Memory

Use cdxcore.npshm.read_shared_array() to read numpy arrays stored to disk with to_file into shared memory.

Parameters:
filestr | int

Filename or an open file handle from open().

arraynumpy.ndarray

The array. Objects of type cdxcore.sharedarray.ndsharedarray are identified as numpy.ndarray arrays.

bufferingint, default -1

Buffering strategy. Only used if file is a string and open() is called. Use 0 to turn off buffering. The default, -1, is the default.

cont_block_size_mbint | None, default None

By default this function does not write non-continguous arrays (those not laid out linearly in memory). Use cont_block_size_mb to enable an intermediary buffer of size cont_block_size_mb to do so.

Raises:
I/O errorIOError

In case the function failed to write the whole file.

Value errorValueError

In case an array is passed whose dtype is not contained in cdxcore.npio._DTYPE_TO_CODE, which has more than 32k dimensions, or which has an indivudual dimension longer than 2bn lines.

Not continguousRuntimeError

Raised if array is not continguous and cont_block_size_mb is None, its default.