cdxcore.npshm#
Shared memory numpy arrays.
Overview#
The functions in this module wrap multiprocessing.shared_memory.SharedMemory into a numpy array
with garbage collection clean up, depending in the operating system.
In process 1:
from cdxcore.npshm import create_shared_array
import numpy as np
test_name = "Test2121"
test = create_shared_array( test_name, shape=(10,3), dtype=np.int32, force=True, full=0 )
test[:,1] = 1
In process 2:
from cdxcore.npshm import create_shared_array
import numpy as np
test_name = "Test2121"
test = attach_shared_array( test_name, validate_shape=(10,3), validate_dtype=np.int32 )
assert np.all( test[:,1] == 1)
test[:,2] = 2
Back in process 1:
assert np.all( test[:,2] == 2)
Loading binary files#
This module’s’ cdxcore.npshm.read_shared_array() reads numpy arrays stored to disk
with cdxcore.npio.to_file() directly into shared memory.
Persistence / Garbage Collection#
The functions here are simplistic wrappers around multiprocessing.shared_memory.SharedMemory on
Linux and Windows. The returned object will call multiprocessing.shared_memory.SharedMemory.close()
upon garbage collection, but not multiprocessing.shared_memory.SharedMemory.unlink(), at least not by default.
Windows
Under windows, shared memory semantics are very clean: multiple processes can attached to some shared memory; once the last process relinquished their lock, the memory will be removed from the system.
Linux
Linux is more complicated: prior to Python 3.12 it did not track resource usage, and cleaning up shared memory upon termination of all processes was left to the user. From 3.13 onwards, this is somehow automated, but the default implementation produces unseeming warning messages if memory is not also cleaned up manually.
auto-clean up: The last process should call
multiprocessing.shared_memory.SharedMemory.unlink()or, here,cdxcore.npshm.delete_shared_array(). Callingmultiprocessing.shared_memory.SharedMemory.unlink()more often is not guaranteed to work, even though in our experiments under Ubuntu that did not cause issues so far. Obviously asking the last process to close a resource is not very convenient in a true parallel processing setup.To implement this pattern in
cdxcore.npshm, the user can either implement the above manually, or specify theunlinkkeyword when creating/attaching to a shared memory.From Python 3.13 onwards
multiprocessing.shared_memory.SharedMemoryactually optionally makes use of a resource tracker to automatically delete shared memory, but it will issue a warning if it does so. You still have to perform above clean up if you want to avoid the warning.Persistence: Under Linux, you can retain shared memory files in
/dev/shm/after the last process exits. For this use case under Python 3.13, settracktoFalsefor all shared memory use.
PS: The amount of shared memory available on Linux is limited by default. Use ```findmnt -o AVAIL,USED /dev/shm`` to check available size. Modify ``/etc/fstab`` to amend.
Please refer to the documentation for full details.
Import#
from cdxcore.npio import create_shared_array, attach_shared_array, read_shared_array
Documentation#
Module Attributes
Default memory alignment after an internal descriptive header. |
Functions
|
Attach to an existing named shared array. |
|
Create a new named shared array. |
|
Deletes the shared array associated with |
Whether an array is "shared". |
|
|
Read a shared array from disk into a new named shared |
- cdxcore.npshm.ALIGN = 64#
Default memory alignment after an internal descriptive header. A 64 byte alignment ensures that optimized AVX2, AVX512 etc, see this discussion
Attach to an existing named shared array.
This function is a simplistic wrapper around creating a
numpy.ndarraywith an existingmultiprocessing.shared_memory.SharedMemorybuffer. The returned object will callmultiprocessing.shared_memory.SharedMemory.close()upon garbage collection, but notmultiprocessing.shared_memory.SharedMemory.unlink()by default.Linux
Under Linux above settings means that the shared file will reside permanently – and will remain sharable – in
/dev/shm/until it is manually deleted. Callcdxcore.npshm.delete_shared_array()to delete a shared file manually.From Python 3.13 onwards, the
trackargument can be used to explicitly delete the shared array upon exist of the last process.See discussion here
The amount of shared memory available on Linux is limited by default. Use
`findmnt -o AVAIL,USED /dev/shmto check available size. Modify/etc/fstabto amend.Windows
Windows keeps track of access to shared memory and will release it automatically upon garbage collection of the last Python object, or upon destruction of all processes with access to the shared memory block. Therefore the object does not persist between independent runs of your software.
- Parameters:
- namestr
Name of the array. This must be a valid file name. In Linux, shared memory is managed via
/dev/shm/.- validate_shapetuple | None, default
None Validate that array has this shape, if not
None. If the array has a different shape, raise aValueError.- validate_dtypedtype | None, default
None Validate that array has this dtype, if not
None. If the array has a different dtype, raise aValueError.- raise_on_errorbool, default
True If an array called
namedoes not exists: ifraise_on_errorisTrue, this function raises anFileNotFoundErrorexception; otherwise it will returnNone.This function will always raise a
ValueErrorif the shape or dtype validation fails.- read_onlybool, default
False Whether to set numpy’s writeable flag to
False.- trackbool, default
True For Python 3.13 and above only: if set to
True, automatically delete the file upon exit of the last Python process. This option is not available prior to Python 3.13 under Linux. For windows,trackis alwaysTrue.- unlinkbool, default
False Call
multiprocessing.shared_memory.SharedMemory.unlink()upon deletion of the numpy array. This has no effect on windows. On linux, this means:The file on
/dev/shm/is deleted, hence no other processes can attach to it.Existing shared memory instances remain valid.
For expert use only: “trying to access data inside a shared memory block after unlink() may result in memory access errors, depending on platform”; see discussion here. It is recommended to manually call
cdxcore.npshm.delete_shared_array()instead.- verbose
cdxcore.verbose.Context| None, defaultNone If not
Noneprint out activity information, typically for debugging.
- Returns:
- Array
numpy.ndarraylike The array, or
Noneif no array ofnameexists andraise_on_errorisFalse.
- Array
- Raises:
- File not found
FileNotFoundError If an array
namedoes not exist (andraise_on_errorisTrue).- Incorrect geometry
ValueError Raised if
validate_shapeorvalidate_dtypedo not match the array (andraise_on_errorisTrue).
- File not found
Create a new named shared array.
This function is a simplistic wrapper around creating a
numpy.ndarraywith a newly createdmultiprocessing.shared_memory.SharedMemorybuffer. The returned object will callmultiprocessing.shared_memory.SharedMemory.close()upon garbage collection, but notmultiprocessing.shared_memory.SharedMemory.unlink()(by default).This function can
forcecreation of a new array on Linux only.Linux
Under Linux above settings means that the shared file will reside permanently – and will remain sharable – in
/dev/shm/until it is manually deleted. Callcdxcore.npshm.delete_shared_array()to delete a shared file manually.From Python 3.13 onwards, the
trackargument can be used to explicitly delete the shared array upon exist of the last process.See discussion here
The amount of shared memory available on Linux is limited by default. Use
`findmnt -o AVAIL,USED /dev/shmto check available size. Modify/etc/fstabto amend.Windows
Windows keeps track of access to shared memory and will release it automatically upon garbage collection of the last Python object, or upon destruction of all processes with access to the shared memory block. Therefore the object does not persist between independent runs of your software.
- Parameters:
- namestr
Name of the array. This must be a valid file name. In Linux, shared memory is managed via
/dev/shm/. This is the pure filename, do not include/dev/shm/.- shapetuple
Shape of the array.
- dtypedtype | str
Numpy dtype of the array.
- raise_on_errorbool, default
True If an array called
namealready exists: ifraise_on_errorisTrue, then this function raises anFileExistsErrorexception; otherwise it will returnNone.- fullfloat |
numpy.ndarray| None, defaultNone Value to fill array with, or
Noneto not fill the array.- forcebool, default
False Whether to attempt to delete any existing arrays under Linux only. Note that while the file might be get deleted the actual memory is only freed after all references are destroyed.
- trackbool, default
True For Python 3.13 and above only: if set to
True, automatically delete the file upon exit of the last Python process. This option is not available prior to Python 3.13 under Linux. For windows,trackis alwaysTrue.- unlinkbool, default
False Call
multiprocessing.shared_memory.SharedMemory.unlink()upon deletion of the numpy array. This has no effect on windows. On linux, this means:The file on
/dev/shm/is deleted, hence no other processes can attach to it.Existing shared memory instances remain valid.
For expert use only: “trying to access data inside a shared memory block after unlink() may result in memory access errors, depending on platform”; see discussion here. It is recommended to manually call
cdxcore.npshm.delete_shared_array()instead.- verbose
cdxcore.verbose.Context| None, defaultNone If not
Noneprint out activity information, typically for debugging.
- Returns:
- Array
numpy.ndarraylike Shared numpy array, or
Noneif the named array exists and ifraise_on_errorisFalse.
- Array
- Raises:
- File exists
FileExistsError If an array
namealready exists (andraise_on_errorisTrue).
- File exists
Deletes the shared array associated with
nameby callingmultiprocessing.shared_memory.SharedMemory.unlink().Linux
Under Linux, calling
unlink()will prevent further attachments to this file, and allows creating a new file in its place. Existing shares remain operational. Note that the file is deleted immediately, not once the last reference was deleted.Windows
This function does nothing under Windows.
- Parameters:
- namestr
Name of the array. This must be a valid file name. In Linux, shared memory is managed via
/dev/shm/.- raise_on_errorbool, default
True If the file could not be deleted successfully, raise the respective Exception. If the file did not exist, this function will return successfully.
- verbose
cdxcore.verbose.Context| None, defaultNone If not
Noneprint out activity information, typically for debugging.
- Returns:
- Successbool
Whether
namecan now be used for a new shared memory block.
Whether an array is “shared”. This function does not currently work and always returns
True
Read a shared array from disk into a new named shared
numpy.ndarrayin binary format usingcdxcore.npio.read_into().If
accept_existingisTrue, this function will first attempt to attach to an existing shared arrayname.- Parameters:
- filestr | int
File from
open()or file name.- namestr
Name of the array. This must be a valid file name. In Linux, shared memory is managed via
/dev/shm/.- validate_shapetuple | None, default
None Validate that array has this shape, if not
None. If the array has a different shape, raise aValueError.- validate_dtypedtype | None, default
None Validate that array has this dtype, if not
None. If the array has a different dtype, raise aValueError.- read_onlybool, default
False Whether to set numpy’s writeable flag to
False.- accept_existingbool, default
True Whether to first try to attach to an existing shared array
name. If eithervalidate_shapeorvalidate_dtypeisNone, then the function will read the array characteristics from the file on disk even if an existing array exists to ensure its characteristics match those on disk.- bufferingint, default
-1 See
open(). Use -1 for default behaviour.- return_statusbool default
False Whether to return
statusas well. See below.- verbose
cdxcore.verbose.Context| None, defaultNone If not
Noneprint out activity information, typically for debugging.
- Returns:
- Array
numpy.ndarraylike If
return_statusisFalse, return just the array orNoneif an error occurred andraise_on_erroris False.- ( Array, attached )
numpy.ndarraylike, bool If
return_statusisTrue, and if no error occurred, then the function returns a tuple containing the array and a boolean indicating whether the array was attached to an existing shared array (True), or whether a new shared array was created (False). Useful for status messages.
- Array
- Raises:
- File exists
FileExistsError If an array
namealready exists, andaccept_existingwasFalse(andraise_on_errorisTrue).- Incorrect geometry
ValueError Raised if
validate_shapeorvalidate_dtypedo not match the array (andraise_on_errorisTrue).
- File exists