cdxcore.npshm#

Shared memory numpy arrays.

Overview#

The functions in this module wrap multiprocessing.shared_memory.SharedMemory into a numpy array with garbage collection clean up, depending in the operating system.

In process 1:

from cdxcore.npshm import create_shared_array
import numpy as np

test_name = "Test2121"    
test = create_shared_array( test_name, shape=(10,3), dtype=np.int32, force=True, full=0 )
test[:,1] = 1

In process 2:

from cdxcore.npshm import create_shared_array
import numpy as np

test_name = "Test2121"    
test = attach_shared_array( test_name, validate_shape=(10,3), validate_dtype=np.int32 )
assert np.all( test[:,1] == 1)
test[:,2] = 2

Back in process 1:

assert np.all( test[:,2] == 2)

Loading binary files#

This module’s’ cdxcore.npshm.read_shared_array() reads numpy arrays stored to disk with cdxcore.npio.to_file() directly into shared memory.

Persistence / Garbage Collection#

The functions here are simplistic wrappers around multiprocessing.shared_memory.SharedMemory on Linux and Windows. The returned object will call multiprocessing.shared_memory.SharedMemory.close() upon garbage collection, but not multiprocessing.shared_memory.SharedMemory.unlink(), at least not by default.

Windows

Under windows, shared memory semantics are very clean: multiple processes can attached to some shared memory; once the last process relinquished their lock, the memory will be removed from the system.

Linux

Linux is more complicated: prior to Python 3.12 it did not track resource usage, and cleaning up shared memory upon termination of all processes was left to the user. From 3.13 onwards, this is somehow automated, but the default implementation produces unseeming warning messages if memory is not also cleaned up manually.

  • auto-clean up: The last process should call multiprocessing.shared_memory.SharedMemory.unlink() or, here, cdxcore.npshm.delete_shared_array(). Calling multiprocessing.shared_memory.SharedMemory.unlink() more often is not guaranteed to work, even though in our experiments under Ubuntu that did not cause issues so far. Obviously asking the last process to close a resource is not very convenient in a true parallel processing setup.

    To implement this pattern in cdxcore.npshm, the user can either implement the above manually, or specify the unlink keyword when creating/attaching to a shared memory.

    From Python 3.13 onwards multiprocessing.shared_memory.SharedMemory actually optionally makes use of a resource tracker to automatically delete shared memory, but it will issue a warning if it does so. You still have to perform above clean up if you want to avoid the warning.

  • Persistence: Under Linux, you can retain shared memory files in /dev/shm/ after the last process exits. For this use case under Python 3.13, set track to False for all shared memory use.

PS: The amount of shared memory available on Linux is limited by default. Use ```findmnt -o AVAIL,USED /dev/shm`` to check available size. Modify ``/etc/fstab`` to amend.

Please refer to the documentation for full details.

Import#

from cdxcore.npio import create_shared_array, attach_shared_array, read_shared_array

Documentation#

Module Attributes

ALIGN

Default memory alignment after an internal descriptive header.

Functions

attach_shared_array(name, *[, ...])

Attach to an existing named shared array.

create_shared_array(name, shape, dtype, *[, ...])

Create a new named shared array.

delete_shared_array(name[, raise_on_error, ...])

Deletes the shared array associated with name by calling multiprocessing.shared_memory.SharedMemory.unlink().

is_shared_array(x)

Whether an array is "shared".

read_shared_array(file, name, *[, ...])

Read a shared array from disk into a new named shared numpy.ndarray in binary format using cdxcore.npio.read_into().

cdxcore.npshm.ALIGN = 64#

Default memory alignment after an internal descriptive header. A 64 byte alignment ensures that optimized AVX2, AVX512 etc, see this discussion

cdxcore.npshm.attach_shared_array(name, *, validate_shape=None, validate_dtype=None, raise_on_error=True, read_only=False, track=None, unlink=False, verbose=None)[source]#

Attach to an existing named shared array.

This function is a simplistic wrapper around creating a numpy.ndarray with an existing multiprocessing.shared_memory.SharedMemory buffer. The returned object will call multiprocessing.shared_memory.SharedMemory.close() upon garbage collection, but not multiprocessing.shared_memory.SharedMemory.unlink() by default.

Linux

Under Linux above settings means that the shared file will reside permanently – and will remain sharable – in /dev/shm/ until it is manually deleted. Call cdxcore.npshm.delete_shared_array() to delete a shared file manually.

From Python 3.13 onwards, the track argument can be used to explicitly delete the shared array upon exist of the last process.

See discussion here

The amount of shared memory available on Linux is limited by default. Use `findmnt -o AVAIL,USED /dev/shm to check available size. Modify /etc/fstab to amend.

Windows

Windows keeps track of access to shared memory and will release it automatically upon garbage collection of the last Python object, or upon destruction of all processes with access to the shared memory block. Therefore the object does not persist between independent runs of your software.

Parameters:
namestr

Name of the array. This must be a valid file name. In Linux, shared memory is managed via /dev/shm/.

validate_shapetuple | None, default None

Validate that array has this shape, if not None. If the array has a different shape, raise a ValueError.

validate_dtypedtype | None, default None

Validate that array has this dtype, if not None. If the array has a different dtype, raise a ValueError.

raise_on_errorbool, default True

If an array called name does not exists: if raise_on_error is True, this function raises an FileNotFoundError exception; otherwise it will return None.

This function will always raise a ValueError if the shape or dtype validation fails.

read_onlybool, default False

Whether to set numpy’s writeable flag to False.

trackbool, default True

For Python 3.13 and above only: if set to True, automatically delete the file upon exit of the last Python process. This option is not available prior to Python 3.13 under Linux. For windows, track is always True.

unlinkbool, default False

Call multiprocessing.shared_memory.SharedMemory.unlink() upon deletion of the numpy array. This has no effect on windows. On linux, this means:

  1. The file on /dev/shm/ is deleted, hence no other processes can attach to it.

  2. Existing shared memory instances remain valid.

For expert use only: “trying to access data inside a shared memory block after unlink() may result in memory access errors, depending on platform”; see discussion here. It is recommended to manually call cdxcore.npshm.delete_shared_array() instead.

verbosecdxcore.verbose.Context | None, default None

If not None print out activity information, typically for debugging.

Returns:
Arraynumpy.ndarray like

The array, or None if no array of name exists and raise_on_error is False.

Raises:
File not foundFileNotFoundError

If an array name does not exist (and raise_on_error is True).

Incorrect geometryValueError

Raised if validate_shape or validate_dtype do not match the array (and raise_on_error is True).

cdxcore.npshm.create_shared_array(name, shape, dtype, *, raise_on_error=True, full=None, force=False, track=None, unlink=False, verbose=None)[source]#

Create a new named shared array.

This function is a simplistic wrapper around creating a numpy.ndarray with a newly created multiprocessing.shared_memory.SharedMemory buffer. The returned object will call multiprocessing.shared_memory.SharedMemory.close() upon garbage collection, but not multiprocessing.shared_memory.SharedMemory.unlink() (by default).

This function can force creation of a new array on Linux only.

Linux

Under Linux above settings means that the shared file will reside permanently – and will remain sharable – in /dev/shm/ until it is manually deleted. Call cdxcore.npshm.delete_shared_array() to delete a shared file manually.

From Python 3.13 onwards, the track argument can be used to explicitly delete the shared array upon exist of the last process.

See discussion here

The amount of shared memory available on Linux is limited by default. Use `findmnt -o AVAIL,USED /dev/shm to check available size. Modify /etc/fstab to amend.

Windows

Windows keeps track of access to shared memory and will release it automatically upon garbage collection of the last Python object, or upon destruction of all processes with access to the shared memory block. Therefore the object does not persist between independent runs of your software.

Parameters:
namestr

Name of the array. This must be a valid file name. In Linux, shared memory is managed via /dev/shm/. This is the pure filename, do not include /dev/shm/.

shapetuple

Shape of the array.

dtypedtype | str

Numpy dtype of the array.

raise_on_errorbool, default True

If an array called name already exists: if raise_on_error is True, then this function raises an FileExistsError exception; otherwise it will return None.

fullfloat | numpy.ndarray | None, default None

Value to fill array with, or None to not fill the array.

forcebool, default False

Whether to attempt to delete any existing arrays under Linux only. Note that while the file might be get deleted the actual memory is only freed after all references are destroyed.

trackbool, default True

For Python 3.13 and above only: if set to True, automatically delete the file upon exit of the last Python process. This option is not available prior to Python 3.13 under Linux. For windows, track is always True.

unlinkbool, default False

Call multiprocessing.shared_memory.SharedMemory.unlink() upon deletion of the numpy array. This has no effect on windows. On linux, this means:

  1. The file on /dev/shm/ is deleted, hence no other processes can attach to it.

  2. Existing shared memory instances remain valid.

For expert use only: “trying to access data inside a shared memory block after unlink() may result in memory access errors, depending on platform”; see discussion here. It is recommended to manually call cdxcore.npshm.delete_shared_array() instead.

verbosecdxcore.verbose.Context | None, default None

If not None print out activity information, typically for debugging.

Returns:
Arraynumpy.ndarray like

Shared numpy array, or None if the named array exists and if raise_on_error is False.

Raises:
File existsFileExistsError

If an array name already exists (and raise_on_error is True).

cdxcore.npshm.delete_shared_array(name, raise_on_error=True, *, verbose=None)[source]#

Deletes the shared array associated with name by calling multiprocessing.shared_memory.SharedMemory.unlink().

Linux

Under Linux, calling unlink() will prevent further attachments to this file, and allows creating a new file in its place. Existing shares remain operational. Note that the file is deleted immediately, not once the last reference was deleted.

Windows

This function does nothing under Windows.

Parameters:
namestr

Name of the array. This must be a valid file name. In Linux, shared memory is managed via /dev/shm/.

raise_on_errorbool, default True

If the file could not be deleted successfully, raise the respective Exception. If the file did not exist, this function will return successfully.

verbosecdxcore.verbose.Context | None, default None

If not None print out activity information, typically for debugging.

Returns:
Successbool

Whether name can now be used for a new shared memory block.

cdxcore.npshm.is_shared_array(x)[source]#

Whether an array is “shared”. This function does not currently work and always returns True

cdxcore.npshm.read_shared_array(file, name, *, validate_shape=None, validate_dtype=None, accept_existing=True, buffering=-1, read_only=False, return_status=False, track=None, unlink=False, verbose=None)[source]#

Read a shared array from disk into a new named shared numpy.ndarray in binary format using cdxcore.npio.read_into().

If accept_existing is True, this function will first attempt to attach to an existing shared array name.

Parameters:
filestr | int

File from open() or file name.

namestr

Name of the array. This must be a valid file name. In Linux, shared memory is managed via /dev/shm/.

validate_shapetuple | None, default None

Validate that array has this shape, if not None. If the array has a different shape, raise a ValueError.

validate_dtypedtype | None, default None

Validate that array has this dtype, if not None. If the array has a different dtype, raise a ValueError.

read_onlybool, default False

Whether to set numpy’s writeable flag to False.

accept_existingbool, default True

Whether to first try to attach to an existing shared array name. If either validate_shape or validate_dtype is None, then the function will read the array characteristics from the file on disk even if an existing array exists to ensure its characteristics match those on disk.

bufferingint, default -1

See open(). Use -1 for default behaviour.

return_statusbool default False

Whether to return status as well. See below.

verbosecdxcore.verbose.Context | None, default None

If not None print out activity information, typically for debugging.

Returns:
Arraynumpy.ndarray like

If return_status is False, return just the array or None if an error occurred and raise_on_error is False.

( Array, attached )numpy.ndarray like, bool

If return_status is True, and if no error occurred, then the function returns a tuple containing the array and a boolean indicating whether the array was attached to an existing shared array (True), or whether a new shared array was created (False). Useful for status messages.

Raises:
File existsFileExistsError

If an array name already exists, and accept_existing was False (and raise_on_error is True).

Incorrect geometryValueError

Raised if validate_shape or validate_dtype do not match the array (and raise_on_error is True).