cdxcore.subdir#
Utilities for file i/o, directory management and streamlined versioned caching.
Overview#
The key idea is to provide transparent, concise pickle
access to the file system
via the cdxcore.subdir.SubDir
class.
Key design features:
Simple path construction via
()
operator. By default directories which do not exist yet are only created upon writing a first file.Files managed by
cdxcore.subdir.SubDir
all have the same extension.Files support “fast versioning”: the version of a file can be read without having to read the entire file.
cdxcore.subdir.SubDir.cache()
implements a convenient versioned caching framework.
Directories#
The core of the framework is the cdxcore.subdir.SubDir
class which represents a directory
with files of a given extension.
Simply write:
from cdxcore.subdir import SubDir
subdir = SubDir("my_directory") # relative to current working directory
subdir = SubDir("./my_directory") # relative to current working directory
subdir = SubDir("~/my_directory") # relative to home directory
subdir = SubDir("!/my_directory") # relative to default temp directory
subdir = SubDir("?!/my_directory") # relative to a temporary temp directory; this directory will be cleared upon (orderly) exit of ``SubDir``.
Note that my_directoy
will not be created if it does not exist yet. It will be created the first
time we write a file.
You can specify a parent for relative path names:
from cdxcore.subdir import SubDir
subdir = SubDir("my_directory", "~") # relative to home directory
subdir = SubDir("my_directory", "!") # relative to default temp directory
subdir = SubDir("my_directory", ".") # relative to current directory
subdir2 = SubDir("my_directory", subdir) # subdir2 is relative to `subdir`
Change the extension to “bin”:
from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory;*.bin")
subdir = SubDir("~/my_directory", ext="bin")
subdir = SubDir("my_directory", "~", ext="bin")
You can turn off extension management by setting the extension to ""
:
from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory", ext="")
You can also use cdxcore.subdir.SubDir.__call__()
to generate sub directories:
from cdxcore.subdir import SubDir
parent = SubDir("~/parent")
subdir = parent("subdir")
Be aware that when the operator cdxcore.subdir.SubDir.__call__()
is called with two keyword arguments, then it reads files.
You can obtain a list of all sub directories in a directory by using cdxcore.subdir.SubDir.sub_dirs()
.
The list of files with the corresponding extension is accessible via cdxcore.subdir.SubDir.files()
.
File Format#
cdxcore.subdir.SubDir
supports file i/o with a number of different file formats:
“PICKLE”: standard pickling with default extension “pck”.
“JSON_PICKLE”: uses the
jsonpickle
package; default extension “jpck”. The advantage of this format over “PICKLE” is that it is somewhat human-readable. However,jsonpickle
uses compressed formats for complex objects such asnumpy
arrays, hence readablility is somewhat limited. Using “JSON_PICKLE” comes at cost of slower i/o speed.“JSON_PLAIN”: calls
cdxcore.util.plain()
is an output-only format to generate human readable files which (usually) cannot be loaded back from disk. In this modeSubDir
converts objects into plain Python objects before usingjson
to write them to disk. That means that deserialized data does not have the correct object structure for being restored properly. However, such files are much easier to read.“BLOSC” uses blosc to read/write compressed binary data. The blosc compression algorithm is very fast, hence using this mode will not usually lead to notably slower performance than using “PICKLE” but will generate smaller files, depending on your data structure. The default extension for “BLOSC” is “zbsc”.
“GZIP”: uses
gzip
to to read/write compressed binary data. The default extension is “pgz”.
Summary of properties:
Format |
Restores objects |
Human readable |
Speed |
Compression |
Extension |
---|---|---|---|---|---|
PICKLE |
yes |
no |
high |
no |
.pck |
JSON_PLAIN |
no |
yes |
low |
no |
.json |
JSON_PICKLE |
yes |
limited |
low |
no |
.jpck |
BLOSC |
yes |
no |
high |
yes |
.zbsc |
GZIP |
yes |
no |
high |
yes |
.pgz |
You may specify the file format when instantiating cdxcore.subdir.SubDir
:
from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory", fmt=SubDir.PICKLE)
subdir = SubDir("~/my_directory", fmt=SubDir.JSON_PICKLE)
...
If ext
is not specified the extension will defaulted to
the respective default extension of the format requested.
Reading Files#
To read the data contained in a file
from our subdirectory
with its reference extension use cdxcore.subdir.SubDir.read()
:
from cdxcore.subdir import SubDir
subdir = SubDir("!/test")
data = subdir.read("file") # returns the default `None` if file.pck is not found
data = subdir.read("file", default=[]) # returns the default [] if file.pck is not found
This function will return the “default” (which in turns defaults to None
)
if “file.pck” does not exist.
You can opt to raise an error instead of returning a default
by using raise_on_error=True
:
data = subdir.read("file", raise_on_error=True) # raises 'KeyError' if not found
When calling read()
you may specify an alternative extension:
data = subdir.read("file", ext="bin") # change extension to "bin"
data = subdir.read("file.bin", ext="") # no automatic extension
Specifying a different format for cdxcore.subdir.SubDir.read()
only changes
the extension automatically if you have not overwritten it before:
subdir = SubDir("!/test") # default format PICKLE with extension pck
data = subdir.read("file", fmt=Subdir.JSON_PICKLE ) # uses "json" extension
subdir = SubDir("!/test", ext="bin") # user-specified extension
data = subdir.read("file", fmt=Subdir.JSON_PICKLE ) # keeps using "bin"
You can also use the cdxcore.subdir.SubDir.__call__()
to read files, in which case you must specify a default value
(if you don’t, then the operator will return a sub directory):
data = subdir("file", None) # returns None if file is not found
You can also use item notation to access files. In this case, though, an error will be thrown if the file does not exist:
data = subdir['file'] # raises KeyError if file is not found
You can read a range of files in one function call:
data = subdir.read( ["file1", "file2"] ) # returns list
Finally, you can also iterate through all existing files using iterators:
# manual loading
for file in subdir:
data = subdir.read(file)
...
# automatic loading, with "None" as a default
for file, data in subdir.items():
...
To obtain a list of all files in our directory which have the correct extension, use cdxcore.subdir.SubDir.files()
.
Writing Files#
Writing files mirrors reading them:
from cdxcore.subdir import SubDir
subdir = SubDir("!/test")
subdir.write("file", data)
subdir['file'] = data
You may specifify different a extension:
subdir.write("file", data, ext="bin")
You can also specify a file cdxcore.subdir.Format
.
The extension will be changed automatically if you have not set it manually:
subdir = SubDir("!/test")
subdir.write("file", data, fmt=SubDir.JSON_PICKLE ) # will write to "file.json"
To write several files at once, write:
subdir.write(["file1", "file"], [data1, data2])
Note that when writing to a file, cdxcore.subdir.SubDir.write()
will first write to a temporary file, and then rename this file into the target file name.
The temporary file name is generated by applying cdxcore.uniquehash.unique_hash48()
to the
target file name,
current time, process and thread ID, as well as the machines’s UUID.
This is done to reduce collisions between processes/machines accessing the same files,
potentially accross a network.
It does not remove collision risk entirely, though.
Filenames#
cdxcore.subdir.SubDir
transparently handles directory access and extensions.
That means a user usually only uses file
names which do not contain either.
To obtain the full qualified filename given a “file” use cdxcore.subdir.SubDir.full_file_name()
.
Reading and Writing Versioned Files#
cdxcore.subdir.SubDir
supports versioned files.
If versions are used, then they must be used for both reading and writing.
cdxcore.version.version()
provides a standard decorator framework for definining
versions for classes and functions including version dependencies.
If a version
is provided for cdxcore.subdir.SubDir.write()
then SubDir
will write the version in a block ahead of the main content of the file.
In case of the PICKLE format, this is a byte string. In case of JSON_PLAIN and JSON_PICKLE this is line of
text starting with #
ahead of the file. (Note that this violates
the JSON file format.)
Writing a short version block ahead of the main data allows cdxcore.subdir.SubDir.read()
to read this version information back quickly without reading the entire file.
read()
does attempt so if its called with a version
parameter.
In this case it will compare the read version with the provided version,
and only return the main content of the file if versions match.
Use cdxcore.subdir.SubDir.is_version()
to check whether a given file has a specific version.
Like read()
this function only reads the information required to obtain the information and will
be much faster than reading the whole file.
Important: if a file was written with a version
, then it has to be read again with a test version.
You can specify version="*"
for cdxcore.subdir.SubDir.read()
to match any version.
Examples:
Writing a versioned file:
from cdxcore.subdir import SubDir
sub_dir = SubDir("!/test_version)
sub_dir.write("test", [1,2,3], version="0.0.1" )
To read [1,2,3]
from “test” we need to use the correct version:
_ = sub_dir.read("test", version="0.0.1")
The following will not read “test” as the versions do not match:
_ = sub_dir.read("test", version="0.0.2")
By default cdxcore.subdir.SubDir.read()
will not fail if a version mismatch is encountered; rather it will
attempt to delete the file and then return the default
value.
This can be turned off
with the keyword delete_wrong_version
set to False
.
You can ignore the version used to writing a file by using "*"
as version:
_ = sub_dir.read("test", version="*")
Note that reading files which have been written with a version
without
version
keyword will fail because SubDir
will only append additional version information
to the file if required.
Test existence of Files#
To test existence of ‘file’ in a directory, use one of:
subdir.exist('file')
'file' in subdir
Deleting files#
To delete a ‘file’, use any of the following:
subdir.delete("file")
del subdir['file']
All of these are silent, and will not throw errors if file
does not exist.
In order to throw an error use:
subdir.delete('file', raise_on_error=True)
A few member functions assist in deleting a number of files:
cdxcore.subdir.SubDir.delete_all_files()
: delete all files in the directory with matching extension. Do not delete sub directories, or files with extensions different to our own.cdxcore.subdir.SubDir.delete_all_content()
: delete all files with our extension, including in all sub-directories. If a sub-directory is left empty upondelete_all_content
delete it, too.cdxcore.subdir.SubDir.delete_everything()
: deletes everything, not just files with matching extensions.
Caching#
A cdxcore.subdir.SubDir
object offers an advanced context for caching calls to collection.abc.Callable`
objects with cdxcore.subdir.SubDir.cache()
.
from cdxcore.subdir import SubDir
cache = SubDir("!/.cache")
cache.delete_all_content() # for illustration
@cache.cache("0.1")
def f(x,y):
return x*y
_ = f(1,2) # function gets computed and the result cached
_ = f(1,2) # restore result from cache
_ = f(2,2) # different parameters: compute and store result
This involves keying the cache by the function name and its current parameters using cdxcore.uniquehash.UniqueHash
,
and monitoring the functions version using cdxcore.version.version()
. The caching behaviour itself can be controlled by
specifying the desired cdxcore.subdir.CacheMode
.
See cdxcore.subdir.SubDir.cache()
for full feature set.
Import#
import cdxcore.uniquehash as uniquehash
Documentation#
Functions
|
Create a root directory for versioned caching on disk using |
Classes
|
Wrapper for a cached function. |
|
Central control parameters for caching. |
|
Information on cfunctions decorated with |
|
A class which encodes standard behaviour of a caching strategy. |
Utility class to track caching and be able to delete all dependent objects. |
|
|
File formats for |
|
|
Exceptions
Exception raised in case a file was read which had a version, but no test version was provided. |
- class cdxcore.subdir.CacheCallable(subdir, *, version=None, dependencies, label=None, uid=None, name=None, exclude_args=None, include_args=None, exclude_arg_types=None, version_auto_class=True, name_of_name_arg='name')[source]#
Bases:
object
Wrapper for a cached function.
This is the wrapper returned by
cdxcore.subdir.SubDir.cache()
.- Attributes:
cache_controller
Returns the
cdxcore.subdir.CacheController
cache_mode
Returns the
cdxcore.subdir.CacheMode
of the underlyingcdxcore.subdir.CacheController
debug_verbose
Returns the debug
cdxcore.verbose.Context
used to print caching information, orNone
global_exclude_arg_types
Returns
exclude_arg_types
of the underlyingcdxcore.subdir.CacheController
labelledFileName
Returns
labelledFileName()
of the underlyingcdxcore.subdir.CacheController
uid_or_label
ID or label
unique
Whether the ID is unique
uniqueFileName
Returns
uniqueFileName()
of the underlyingcdxcore.subdir.CacheController
Methods
__call__
(F)Decorate
F
as cachable callable.- __call__(F)[source]#
Decorate
F
as cachable callable. Seecdxcore.subdir.SubDir.cache()
for documentation.
- property cache_controller: CacheController#
Returns the
cdxcore.subdir.CacheController
- property cache_mode: CacheMode#
Returns the
cdxcore.subdir.CacheMode
of the underlyingcdxcore.subdir.CacheController
- property debug_verbose: Context#
Returns the debug
cdxcore.verbose.Context
used to print caching information, orNone
- property global_exclude_arg_types: list[type]#
Returns
exclude_arg_types
of the underlyingcdxcore.subdir.CacheController
- property labelledFileName: Callable#
Returns
labelledFileName()
of the underlyingcdxcore.subdir.CacheController
- property uniqueFileName: Callable#
Returns
uniqueFileName()
of the underlyingcdxcore.subdir.CacheController
- class cdxcore.subdir.CacheController(*, exclude_arg_types=[<class 'cdxcore.verbose.Context'>], cache_mode='on', max_filename_length=48, hash_length=8, debug_verbose=None, keep_last_arguments=False)[source]#
Bases:
object
Central control parameters for caching.
When a parameter object of this type is assigned to a
cdxcore.subdir.SubDir
, then it is passed on when sub-directories are created. This way allSubDir
have the same caching behaviour.See
cdxcore.subdir.CacheController
for a list of control parameters.- Parameters:
- exclude_arg_typeslist[type], optional
List of types to exclude from producing unique ids from function arguments.
Defaults to
[Context]
.- cache_modeCacheMode, default
ON
Top level cache control. Set to “OFF” to turn off all caching.
- max_filename_lengthint, default
48
Maximum filename length. If unique id’s exceed the file name a hash of length
hash_length
will be intergated into the file name. Seecdxcore.uniquehash.NamedUniqueHash
.- hash_lengthint, default
8
Length of the hash used to make sure each filename is unique See
cdxcore.uniquehash.NamedUniqueHash
.- debug_verbose
cdxcore.verbose.Context
| None, defaultNone
If not
None
print caching process messages to this object.- keep_last_argumentsbool, default
False
Keep a dictionary of all parameters as string representations after each function call. If the function
F
was decorated using :meth:cdxcore.subdir.SubDir.cache
, you can access this information viaF.cache_info.last_arguments
.Note that strings are limited to 100 characters per argument to avoid memory overload when large objects are passed.
- class cdxcore.subdir.CacheInfo(name, F, keep_last_arguments)[source]#
Bases:
object
Information on cfunctions decorated with
cdxcore.subdir.SubDir.cache()
.Functions decorated with
cdxcore.subdir.SubDir.cache()
will have a membercache_info
of this type- arguments#
Last arguments used. This member is only present if
keep_last_arguments
was set toTrue
for the relevantcdxcore.subdir.CacheController
.
- filename#
Unique filename of the last function call.
- label#
Label of the last function call.
- last_cached#
Whether the last function call restored data from disk.
- name#
Decoded name of the function.
- signature#
inspect.signature()
of the function.
- version#
Last version used.
- class cdxcore.subdir.CacheMode(mode=None)[source]#
Bases:
object
A class which encodes standard behaviour of a caching strategy.
Summary mechanics:
Action
on
gen
off
update
clear
readonly
load cache from disk if exists
x
x
x
write updates to disk
x
x
x
delete existing object
x
delete existing object if incompatible
x
x
x
Standard Caching Semantics
Assuming we wish to cache results from calling a function
f
in a file namedfilename
in a directorydirectory
, then this is theCacheMode
waterfall:def cache_f( filename : str, directory : SubDir, version : str, cache_mode : CacheMode ): if cache_mode.delete: directory.delete(filename) if cache_mode.read: r = directory.read(filename, default=None, version=version, raise_on_error=False, delete_wrong_version=cache_mode.del_incomp ) if not r is None: return r r = f(...) # compute result if cache_mode.write: directory.write(filename, r, version=version, raise_on_error=False ) return r
See
cdxcore.subdir.SubDir.cache()
for a comprehensive implementation.- Parameters:
- modestr, optional
Which mode to use:
"on"
,"gen"
,"off"
,"update"
,"clear"
or"readonly"
.The default is
None
in which case"on"
is used.
- Attributes:
del_incomp
Whether to delete existing data if it is not compatible or has the wrong version.
delete
Whether to delete existing data.
is_clear
Whether this cache mode is CLEAR.
is_gen
Whether this cache mode is GEN.
is_off
Whether this cache mode is OFF.
is_on
Whether this cache mode is ON.
is_readonly
Whether this cache mode is READONLY.
is_update
Whether this cache mode is UPDATE.
read
Whether to load any existing cached data.
write
Whether to cache newly computed data to disk.
- CLEAR = 'clear'#
- GEN = 'gen'#
- HELP = "'on' for standard caching; 'gen' for caching but keep existing incompatible files; 'off' to turn off; 'update' to overwrite any existing cache; 'clear' to clear existing caches; 'readonly' to read existing caches but not write new ones"#
Standard
config
help text, to be used withcdxcore.config.Config.__call__()
as follows:from cdxcore.config import Config from cdxcore.subdir import CacheMode def get_cache_mode( config : Config ) -> CacheMode: return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
- MODES = ['on', 'gen', 'off', 'update', 'clear', 'readonly']#
List of available modes in text form. This list can be used as
cast
parameter when callingcdxcore.config.Config.__call__()
:from cdxcore.config import Config from cdxcore.subdir import CacheMode def get_cache_mode( config : Config ) -> CacheMode: return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
- OFF = 'off'#
- ON = 'on'#
- READONLY = 'readonly'#
- UPDATE = 'update'#
- class cdxcore.subdir.CacheTracker[source]#
Bases:
object
Utility class to track caching and be able to delete all dependent objects.
Methods
Delete all tracked files
- class cdxcore.subdir.Format(*values)[source]#
Bases:
Enum
File formats for
cdxcore.subdir.SubDir
.Format
Restores objects
Human readable
Speed
Compression
Extension
PICKLE
yes
no
high
no
.pck
JSON_PLAIN
no
yes
low
no
.json
JSON_PICKLE
yes
limited
low
no
.jpck
BLOSC
yes
no
high
yes
.zbsc
GZIP
yes
no
high
yes
.pgz
- BLOSC = 3#
blosc
binary compressed format.
- JSON_PICKLE = 1#
jsonpickle
format.
- JSON_PLAIN = 2#
json
format.
- class cdxcore.subdir.SubDir(name, parent=None, *, ext=None, fmt=None, create_directory=None, cache_controller=None, delete_everything=False, delete_everything_upon_exit=False)[source]#
Bases:
object
SubDir
implements a transparent i/o interface for storing data in files.Directories
Instantiate a
SubDir
with a directory name. There are some pre-defined relative system paths the name can refer to:from cdxcore.subdir import SubDir parent = SubDir("!/subdir") # relative to system temp directory parent = SubDir("~/subdir") # relative to user home directory parent = SubDir("./subdir") # relative to current working directory (explicit) parent = SubDir("subdir") # relative to current working directory (implicit) parent = SubDir("/tmp/subdir") # absolute path (linux) parent = SubDir("C:/temp/subdir") # absolute path (windows) parent = SubDir("") # current working directory
Sub-directories can be generated in a number of ways:
subDir = parent('subdir') # using __call__ subDir = SubDir('subdir', parent) # explicit constructor subDir = SubDir('subdir', parent="!/") # explicit constructor with parent being a string
Files managed by
SubDir
will usually have the same extension. This extension can be specified withext
, or as part of the directory string:subDir = SubDir("~/subdir", ext="bin") # set extension to 'bin' subDir = SubDir("~/subdir;*.bin") # set extension to 'bin'
Leaving the extension as default
None
allowsSubDir
to automatically use the extension associated with any specified format.Copy Constructor
The constructor is shallow.
File I/O
Write data with
cdxcore.subdir.SubDir.write()
:subDir.write('item3',item3) # explicit subDir['item1'] = item1 # dictionary style
Note that
cdxcore.subdir.SubDir.write()
can write to multiple files at the same time.Read data with
cdxcore.subdir.SubDir.read()
:item = subDir('item', 'i1') # returns 'i1' if not found. item = subdir.read('item') # returns None if not found item = subdir.read('item','i2') # returns 'i2' if not found item = subDir['item'] # raises a KeyError if not found
Treat files in a directory like dictionaries:
for file in subDir: data = subDir[file] f(item, data) for file, data in subDir.items(): f(item, data)
Delete items:
del subDir['item'] # silently fails if 'item' does not exist subDir.delete('item') # silently fails if 'item' does not exist subDir.delete('item', True) # raises a KeyError if 'item' does not exit
Cleaning up:
parent.delete_all_content() # silently deletes all files with matching extensions, and sub directories.
File Format
SubDir
supports a number of file formats viacdxcore.subdir.Format
. Those can be controlled with thefmt
keyword in various functions not leastcdxcore.subdir.SubDir
:subdir = SubDir("!/.test", fmt=SubDir.JSON_PICKLE)
See
cdxcore.subdir.Format
for supported formats.- Parameters:
- namestr:
Name of the directory.
The name may start with any of the following special characters:
'.'
for current directory.'~'
for home directory.'!'
for system default temp directory.'?'
for a temporary temp directory. In this casedelete_everything_upon_exit
is alwaysTrue
.
The directory name may also contain a formatting string for defining
ext
on the fly: for example use"!/test;*.bin"
to specify a directory"test"
in the user’s temp directory with extension"bin"
.The directory name can be set to
None
in which case it is always empty and attempts to write to it fail withEOFError
.- parentstr | SubDir | None, default
None
Parent directory.
If
parent
is acdxcore.subdir.SubDir
then its parameters are used as default values here.- extstr | None, default
None
Extension for files managed by this
SubDir
. All files will share the same extension.If set to
""
no extension is assigned to this directory. That means, for example, thatcdxcore.subdir.SubDir.files()
returns all files contained in the directory, not just files with a specific extension.If
None
, use an extension depending onfmt
:‘pck’ for the default PICKLE format.
‘json’ for JSON_PLAIN.
‘jpck’ for JSON_PICKLE.
‘zbsc’ for BLOSC.
‘pgz’ for GZIP.
- fmt
cdxcore.subdir.Format
| None, defaultFormat.PICKLE
One of the
cdxcore.subdir.Format
codes. Ifext
is left toNone
then setting the a format will also set the corrspondingext
.- create_directorybool | None, default
False
Whether to create the directory upon creation of the
SubDir
object; otherwise it will be created upon firstcdxcore.subdir.SubDir.write()
.Set to
None
to use the setting of the parent directory, orFalse
if no parent is specified.- cache_controller
cdxcore.subdir.CacheController
| None, defaultNone
An object which fine-tunes the behaviour of
cdxcore.subdir.SubDir.cache()
. See that function’s documentation for further details. Default isNone
.- delete_everythingbool, default
False
Delete all contents in the newly defined sub directory upon creation.
- delete_everything_upon_exitbool, default
False
Delete all contents of the current exist if
self
is deleted. This is the alwaysTrue
if the"?/"
pretext was used.Note, however, that this will only be executed once the object is garbage collected.
Default is, for some good reason,
False
.
- Attributes:
cache_controller
Returns an assigned
cdxcore.subdir.CacheController
, orNone
cache_mode
Returns the
cdxcore.subdir.CacheMode
associated with the underlying cache controllerexisting_path
Return current path, including training
'/'
.ext
Returns the common extension of the files in this directory, including leading
'.'
.fmt
Returns current
cdxcore.subdir.Format
.is_none
Whether this object is
None
or not.path
Return current path, including trailing
'/'
.
Methods
Format
(*values)The same as
cdxcore.subdir.Format
for convenience__call__
(element[, default, raise_on_error, ...])Read either data from a file, or return a new sub directory.
as_format
(format_name)Converts a named format into the respective format code.
auto_ext
([ext_or_fmt])Computes the effective extension based on theh inputs
ext_or_fmt
, and the current settings forself
.auto_ext_fmt
(*[, ext, fmt])Computes the effective extension and format based on inputs
ext
andfmt
, each of which defaults to the respective values ofself
.cache
([version, dependencies, label, uid, ...])Advanced versioned caching for callables.
cache_class
([version, name, dependencies, ...])Short-cut for
cdxcore.subdir.SubDir.cache()
applied to classes with a reduced number of available parameters.cache_init
([label, uid, exclude_args, ...])Short-cut for
cdxcore.subdir.SubDir.cache()
applied to decorating__init__
with a reduced number of available parameters.Creates the current directory if it doesn't exist yet.
delete
(file[, raise_on_error, ext])Deletes
file
.delete_all_content
([delete_self, ...])Deletes all valid keys and subdirectories in this sub directory.
delete_all_files
([raise_on_error, ext])Deletes all valid keys in this sub directory with the correct extension.
delete_everything
([keep_directory])Deletes the entire sub directory will all contents.
exists
(file, *[, ext])Checks whether a file exists.
expand_std_root
(name)Expands
name
by a standardized root directory if provided:file_size
(file, *[, ext])Returns the file size of a file.
files
(*[, ext])Returns a list of files in this subdirectory with the current extension, or the specified extension.
full_file_name
(file, *[, ext])Returns fully qualified file name.
full_temp_file_name
([file, ext, ...])Returns a fully qualified unique temporary file name with path and extension
get_creation_time
(file, *[, ext])Returns the creation time of a file.
get_last_access_time
(file, *[, ext])Returns the last access time of a file.
get_last_modification_time
(file, *[, ext])Returns the last modification time a file.
get_version
(file[, raise_on_error, ext, fmt])Returns a version stored in a file.
is_version
(file[, version, raise_on_error, ...])Tests the version of a file.
items
(*[, ext, raise_on_error])Dictionary-style iterable of filenames and their content.
Whether the current directory exists
read
(file[, default, raise_on_error, ...])Read data from a file if the file exists, or return
default
.read_string
(file[, default, raise_on_error, ext])Reads text from a file.
remove_bad_file_characters
(file[, by])Replaces invalid characters in a filename using the map
by
.rename
(source, target, *[, ext])Rename a file.
sub_dirs
()Retrieve a list of all sub directories.
temp_dir
()Return system temp directory.
temp_file_name
([file])Returns a unique temporary file name.
Return a temporary temp directory name using
tempfile.mkdtemp()
.user_dir
()Return current working directory.
Return current working directory.
write
(file, obj[, raise_on_error, version, ...])Writes an object to file.
write_string
(file, line[, raise_on_error, ext])Writes a line of text into a file.
RETURN_SUB_DIRECTORY
- DEFAULT_FORMAT = 0#
Default
cdxcore.subdir.Format
:Format.PICKLE
- class Format(*values)#
Bases:
Enum
The same as
cdxcore.subdir.Format
for convenience- BLOSC = 3#
blosc
binary compressed format.
- JSON_PICKLE = 1#
jsonpickle
format.
- JSON_PLAIN = 2#
json
format.
- __call__(element, default=<class 'cdxcore.subdir.SubDir.__RETURN_SUB_DIRECTORY'>, raise_on_error=False, *, version=None, ext=None, fmt=None, delete_wrong_version=True, create_directory=None)[source]#
Read either data from a file, or return a new sub directory.
If only the
element
argument is used, then this function returns a new sub directory namedelement
.If both
element
anddefault
arguments are used, then this function attempts to read the fileelement
from disk, returningdefault
if it does not exist.Assume we have a subdirectory
sd
:from cdxcore.subdir import SubDir sd = SubDir("!/test")
Reading files:
x = sd('file', None) # reads 'file' with default value None x = sd('sd/file', default=1) # reads 'file' from sub directory 'sd' with default value 1 x = sd('file', default=1, ext="tmp") # reads 'file.tmp' with default value 1
Create sub directory:
sd2 = sd("subdir") # creates and returns handle to subdirectory 'subdir' sd2 = sd("subdir1/subdir2") # creates and returns handle to subdirectory 'subdir1/subdir2' sd2 = sd("subdir1/subdir2", ext=".tmp") # creates and returns handle to subdirectory 'subdir1/subdir2' with extension "tmp" sd2 = sd(ext=".tmp") # returns handle to current subdirectory with extension "tmp"
- Parameters:
- elementstr
File or directory name, or a list thereof.
- defaultoptional
If specified, this function reads
element
withread( element, default, *args, **kwargs )
.If
default
is not specified, then this function returns a new sub-directory by callingSubDir(element,parent=self,ext=ext,fmt=fmt)
.- create_directorybool, default
None
When creating sub-directories:
Whether or not to instantly create the sub-directory. The default,
None
, is to inherit the behaviour fromself
.- raise_on_errorbool, default
False
When reading files:
Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns
default
.- versionstr | None, default
None
When reading files:
If not
None
, specifies the version of the current code base.In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a
cdxcore.version.VersionError
exception).You can specify version
"*"
to accept any version. Note that this is distinct to usingNone
which stipulates that the file should not have version information.- delete_wrong_versionbool, default
True
. When reading files:
If
True
, and if a wrong version was found, delete the file.- extstr | None, default is
None
. When reading files:
Extension to be used, or a list thereof if
element
is a list. Defaults to the extension ofself
.Semantics:
None
to use the default extension ofself
."*"
to use the extension implied byfmt
.""
to turn off extension management.
When creating sub-directories:
Extension for the new subdirectory; set to
None
to inherit the parent’s extension.- fmt
cdxcore.subdir.Format
| None, defaultNone
When reading files:
File format or
None
to use the directory’s default. Note thatfmt
cannot be a list even ifelement
is. Unlessext
or the SubDir’s extension is"*"
, changing the format does not automatically change the extension.When creating sub-directories:
Format for the new sub-directory; set to
None
to inherit the parent’s format.
- Returns:
- Objecttype | SubDir
Either the value in the file, a new sub directory, or lists thereof.
- static as_format(format_name)[source]#
Converts a named format into the respective format code.
Example:
format = SubDir.as_format( config("format", "pickle", SubDir.FORMAT_NAMES, "File format") )
- auto_ext(ext_or_fmt=None)[source]#
Computes the effective extension based on theh inputs
ext_or_fmt
, and the current settings forself
.If
ext_or_fmt
is set to"*"
then the extension associated to the format ofself
is returned.- Parameters:
- ext_or_fmtstr |
cdxcore.subdir.Format
| None, defaultNone
An extension or a format.
- ext_or_fmtstr |
- Returns:
- extstr
The extension with leading
'.'
.
- auto_ext_fmt(*, ext=None, fmt=None)[source]#
Computes the effective extension and format based on inputs
ext
andfmt
, each of which defaults to the respective values ofself
.Resolves an
ext
of"*"
into the extension associated withfmt
.- Returns:
- (ext, fmt)tuple
Here
ext
contains the leading'.'
andfmt
is of typecdxcore.subdir.Format
.
- cache(version=None, *, dependencies=None, label=None, uid=None, name=None, exclude_args=None, include_args=None, exclude_arg_types=None, version_auto_class=True)[source]#
Advanced versioned caching for callables.
Versioned caching is based on the following two simple principles:
Unique Call IDs:
When a function is called with some parameters, the wrapper identifies a unique ID based on the qualified name of the function and on its runtime functional parameters (ie those which alter the outcome of the function). When a function is called the first time with a given unique call ID, it will store the result of the call to disk. If the function is called with the same call ID again, the result is read from disk and returned.
To compute unique call IDs
cdxcore.uniquehash.NamedUniqueHash
is used by default.Code Version:
Each function has a version, which includes dependencies on other functions or classes. If the version of data on disk does not match the current version, it is deleted and the generating function is called again. This way you can use your code to drive updates to data generated with cached functions.
Behind the scenes this is implemented using
cdxcore.version.version()
which means that the version of a cached function can also depend on versions of non-cached functions or other objects.
Caching Functions#
Caching a simple function
f
is staight forward:from cdxcore.subdir import SubDir cache = SubDir("!/.cache") cache.delete_all_content() # for illustration @cache.cache("0.1") def f(x,y): return x*y _ = f(1,2) # function gets computed and the result cached _ = f(1,2) # restore result from cache _ = f(2,2) # different parameters: compute and store result
Cache another function
g
which callsf
, and whose version therefore onf
’s version:@cache.cache("0.1", dependencies=[f]) def g(x,y): return g(x,y)**2
Debugging
When using automated caching it is important to understand how changes in parameters and the version of the a function affect caching. To this end,
cdxcore.subdir.SubDir.cache()
supports a tracing mechanism via the use of acdxcore.subdir.CacheController
:from cdxcore.subdir import SubDir, CacheController, Context ctrl = CacheController( debug_verbose=Context("all") ) cache = SubDir("!/.cache", cache_controller=ctrl ) cache.delete_all_content() # <- delete previous cached files, for this example only @cache.cache("0.1") def f(x,y): return x*y _ = f(1,2) # function gets computed and the result cached _ = f(1,2) # restore result from cache _ = f(2,2) # different parameters: compute and store result
Returns:
00: cache(f@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'. 00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'. 00: cache(f@__main__): read 'f@__main__' version 'version 0.1' from cache 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'. 00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ b5609542d7da0b04.pck'.
Non-Functional Parameters
A function may have non-functional parameters which do not alter the function’s outcome. An example are
debug
flags:from cdxcore.subdir import SubDir cache = SubDir("!/.cache") @cache.cache("0.1", dependencies=[f], exclude_args='debug') def g(x,y,debug): # <--' 'debug' is a non-functional parameter if debug: print(f"h(x={x},y={y})") return g(x,y)**2
You can define certain types as non-functional for all functions wrapped by
cdxcore.subdir.SubDir.cache()
when construcing thecdccore.cache.CacheController
parameter for incdxcore.subdir.SubDir
:from cdxcore.subdir import SubDir class Debugger: def output( cond, message ): print(message) ctrl = CacheController(exclude_arg_types=[Debugger]) # <- exclude 'Debugger' parameters from hasing cache = SubDir("!/.cache") @cache.cache("0.1", dependencies=[f], exclude_args='debug') def g(x,y,debugger : Debugger): # <-- 'debugger' is a non-functional parameter debugger.output(f"h(x={x},y={y})") return g(x,y)**2
Unique IDs and File Naming
The unique call ID of a decorated functions is by default generated by its fully qualified name and a unique hash of its functional parameters.
Key default behaviours of
cdxcore.uniquehash.NamedUniqueHash
:The
NamedUniqueHash
hashes objects via their__dict__
or__slot__
members. This can be overwritten for a class by implementing__unique_hash__
; seecdxcore.uniquehash.NamedUniqueHash
.Function members of objects or any members starting with ‘_’ are not hashed unless this behaviour is changed using
cdxcore.subdir.CacheController
.Numpy and panda frames are hashed using their byte representation. That is slow and not recommended. It is better to identify numpy/panda inputs via their generating characteristic ID.
Either way, hashes are not particularly human readable. It is often useful to have unique IDs and therefore filenames which carry some context information.
This can be achieved by using
label
:from cdxcore.subdir import SubDir, CacheController ctrl = CacheController( debug_verbose=Context("all") ) cache = SubDir("!/.cache", cache_controller=ctrl ) cache.delete_all_content() # for illustration @cache.cache("0.1") # <- no ID def f1(x,y): return x*y @cache.cache("0.1", label="f2({x},{y})") # <- label uses a string to be passed to str.format() def f2(x,y): return x*y
We can also use a function to generate a
label
. In that case all parameters to the function including itsname
are passed to the function. In below example we eat any parameters we are not interested in with** _
:@cache.cache("0.1", label=lambda x,y,**_: f"h({x},{y})", exclude_args='debug') def h(x,y,debug=False): if debug: print(f"h(x={x},y={y})") return x*y
We obtain:
f1(1,1) f2(1,1) h(1,1) 00: cache(f1@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'. 00: cache(f2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'. 00: cache(h@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'. 00: cache(f1@__main__): called 'f1@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f1@__main__ ef197d80d6a0bbb0.pck'. 00: cache(f2@__main__): called 'f2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f2(1,1) bdc3cd99157c10f7.pck'. 00: cache(h@__main__): called 'h(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h(1,1) d3fdafc9182070f4.pck'.
Note that the file names
f2(1,1) bdc3cd99157c10f7.pck
andh(1,1) d3fdafc9182070f4.pck
for thef2
andh
function calls are now easier to read as they are comprised of the label of the function and a terminal hash key. The trailing hash is appended because we do not assume that the label returned bylabel
is unique. Therefore, a hash generated from all thelabel
itself and all pertinent arguments will be appended to the filename.If we know how to generate truly unique IDs which are always valid filenames, then we can use
uid
instead oflabel
:@cache.cache("0.1", uid=lambda x,y,**_: f"h2({x},{y})", exclude_args='debug') def h2(x,y,debug=False): if debug: print(f"h(x={x},y={y})") return x*y h2(1,1)
yields:
00: cache(h2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'. 00: cache(h2@__main__): called 'h2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h2(1,1).pck'.
In particular, the filename is now
h2(1,1).pck
without any hash. Ifuid
is used the parameter of the function are not hashed. Likelabel
the parameteruid
can also be astr.format()
string or a callable.Controlliong which Parameters to Hash
To specify which parameters are pertinent for identifying a unique id, use:
include_args
: list of functions arguments to include. IfNone
, use all parameteres as input in the next stepexclude_args
: list of function arguments to exclude, if notNone
.exclude_arg_types
: a list of types to exclude. This is helpful if control flow is managed with dedicated data types. An example of such a type iscdxcore.verbose.Context
which is used to print hierarchical output messages. Types can be globally excluded using acdccore.cache.CacheController
when callingcdxcore.subdir.SubDir
.
Numpy/Pandas
Numpy/Panda data should not be hashed for identifying unique call IDs. Instead, use the defining characteristics for generating the data frames.
For example:
from cdxcore.pretty import PrettyObject from cdxcore.subdir import SubDir cache = SubDir("!/.cache") cache.delete_all_content() # for illustration @cache.cache("0.1") def load_src( src_def ): result = ... load ... return result # ignore 'src_result'. It is uniquely identified by 'src_def' --> @cache.cache("0.1", dependencies=[load_src], exclude_args=['data']) def statistics( stats_def, src_def, data ): stats = ... using data return stats src_def = PrettyObject() src_def.start = "2010-01-01" src_def.end = "2025-01-01" src_def.x = 0.1 stats_def = PrettyObject() stats_def.lambda = 0.1 stats_def.window = 100 data = load_src( src_def ) stats = statistics( stats_def, src_def, data )
While instructive, this case is not optimal: we do not really need to load
data
if we can reconstructstats
fromdata
(unless we needdata
further on).Consider therefore:
@cache.cache("0.1") def load_src( src_def ): result = ... load ... return result # ignore 'src_result'. It is uniquely identified by 'src_def' --> @cache.cache("0.1", dependencies=[load_src]) def statistics_only( stats_def, src_def ): data = load_src( src_def ) # <-- embedd call to load_src() here stats = ... using src_result return stats stats = statistics_only( stats_def, src_def )
Caching Member Functions#
You can cache member functions like any other function. Note that
cdxcore.version.version()
information are by default inherited, i.e. member functions will be dependent on the version of their defining class, and class versions will be dependent on their base classes’ versions:from cdxcore.subdir import SubDir, version cache = SubDir("!/.cache") cache.delete_all_content() # for illustration @version("0.1") class A(object): def __init__(self, x): self.x = x @cache.cache("0.1") def f(self, y): return self.x*y a = A(x=1) _ = a.f(y=1) # compute f and store result _ = a.f(y=1) # load result back from disk a.x = 2 _ = a.f(y=1) # 'a' changed: compute f and store result b = A(x=2) _ = b.f(y=1) # same unique call ID as previous call -> restore result from disk
WARNING
cdxcore.uniquehash.UniqueHash
does not by default process members of objects or dictionaries which start with a “_”. This behaviour can be changed usingcdxcore.subdir.CacheController
. For reasonably complex objects it is recommended to implement for your objects the a custom hashing function:__unique_hash__( self, uniqueHash : UniqueHash, debug_trace : DebugTrace )
This function is described at
cdxcore.uniquehash.UniqueHash
.Caching Bound Member Functions#
Caching bound member functions is technically quite different to caching a function of a class in general, but also supported:
from cdxcore.subdir import SubDir, version cache = SubDir("!/.cache", cache_controller : CacheController(debug_verbose=Context("all"))) cache.delete_all_content() # for illustration class A(object): def __init__(self,x): self.x = x def f(self,y): return self.x*y a = A(x=1) f = cache.cache("0.1", id=lambda self, y : f"a.f({y})")(a.f) # <- decorate bound 'f'. r = c(y=2)
In this case the function
f
is bound toa
. The object is added asself
to the function parameter list even though the bound function parameter list does not includeself
. This, together with the comments on hashing objects above, ensures that (hashed) changes toa
will be reflected in the unique call ID for the member function.Caching Classes#
Classes can also be cached. In this case the creation of a class is cached, i.e. a call to the class constructor restores the respectiv object from disk.
This is done in two steps:
first, the class itself is decorated using
cdxcore.subdir.SubDir.cache()
to provide version information at class level. Only version information are provided here.You can use
cdxcore.subdir.SubDir.cache_class()
as an alias.Secondly, decorate
__init__
. You do not need to specify a version for__init__
as its version usually coincides with the version of the class. At__init__
you define how unique IDs are generated from the parameters passed to object construction.You can use
cdxcore.subdir.SubDir.cache_init()
as an alias.
Simple example:
from cdxcore.subdir import SubDir cache = SubDir("!/.cache") cache.delete_all_content() # for illustration @cache.cache_class("0.1") class A(object): @cache.cache_init(exclude_args=['debug']) def __init__(self, x, debug): if debug: print("__init__",x) self.x = x a = A(1) # caches 'a' b = A(1) # reads the cached object into 'b'
Technical Comments
The function
__init__
does not actually return a value; for this reason behind the scenes it is actually__new__
which is being decorated. Attempting to cache-decorate__new__
manually will lead to an exception.A nuance for
__init__
vs ordinary member function is that theself
parameter is non-functional (in the sense that it is an empty object when__init__
is called).self
is therefore automatically excluded from computing a unique call ID. That also meansself
is not part of the arguments passed touid
:@cache.cache_class("0.1") class A(object): @cache.cache_init(id=lambda x, debug: f"A.__init__(x={x})") # <-- 'self' is not passed to the lambda function; no need to add **_ def __init__(self, x, debug): if debug: print("__init__",x) self.x = x
Decorating classes with
__slots__
does not yet work.See also#
For project-wide use it is usually convenient to control caching at the level of a project-wide cache root directory. The classs
cdxcore.subdir.VersionedCacheRoot
is a thin convenience wrapper around acdxcore.subdir.SubDir
with acdxcore.subdir.CacheController
.The idea is to have a central file,
cache.py
which contains the central root for caching. We recommend using an environment variable to be able to control the location of this directory out side the code. Here is an example with an environment variablePROJECT_CACHE_DIR
:# file cache.py from cdxcore.subdir import VersionedCacheRoot import os as os cache_root = VersionedCacheRoot( os.getenv("PROJECT_CACHE_DIR", "!/.cache") )
In a particular project file, say
pipeline.py
create a file-local cache directory and use it:# file pipeline.py from cache import cache_root cache_dir = cache_root("pipeline") @cache_dir.cache("0.1") def f(x): return x+2 @cache_dir.cache("0.1", dependencies=[f]) def g(x) return f(x)**2 # ...
In case you have issues with caching you can use the central root directory to turn on tracing:
from cdxcore.verbose import Context cache_root = VersionedCacheRoot( os.getenv("PROJECT_CACHE_DIR", "!/.cache"), debug_verbose=Context.all # turn full traing on )
- Parameters:
- versionstr | None, default
None
Version of the function.
If
None
then a common F` must be decorated manually withcdxcore.version.version()
.If set, the function
F
is automatically first decorated withcdxcore.version.version()
for you.
- dependencieslist[type] | None, default
None
A list of version dependencies, either by reference or by name. See
cdxcore.version.version()
for details on name lookup if strings are used.- labelstr | Callable | None, default
None
Specify a human-readabl label for the function call given its parameters. This label is used to generate the cache file name, and is also printed in when tracing hashing operations. Labels are not assumed to be unique, hence a unique hash of the label and the parameters to this function will be appended to generate the actual cache file name.
Use
uid
instead iflabel
represents valid unique filenames. You cannot specify bothuid
andlabel
. If neitheruid
andlabel
are present,name
will be used.Usage:
If
label
is a plain string without{}
formatting: use this string as-is.If
label
is a string with{}
formatting, thenlabel.format( name=name, **parameters )
will be used to generate the actual label.If
label
is aCallable
thenlabel( name=name, **parameters )
will be called to generate the actual label.
See above for examples.
label
cannot be used alongsideuid
.- uidstr | Callable | None, default
None
Alternative to
label
which is assumed to generate a unique cache file name. It has the same semantics aslabel
. When used, parameters to the decorated function are not hashed as theuid
is assumed to be already unique. The string must be a valid file nameUse
label
if the id is not unique. You cannot specify bothuid
andlabel
. If neitheruid
andlabel
are present,name
will be used (as non-uniquelabel
).- namestr | None, default
None
Name of this function which is used either on its own if neither
label
notuid
are used, or which passed as a parametername
to either the callable or the formatting operator. See above for more details.If
name
is not specified it defaults to__qualname__
expanded by the module name the function is defined in.- include_argslist[str] | None, default
None
List of arguments to include in generating an unqiue ID, or
None
for all.- exclude_argslist[str] | None, default
None
List of arguments to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements.
- exclude_arg_typeslist[type] | None, default
None
List of parameter types to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements.
- version_auto_classbool, default
True
Whether to automaticallty add version dependencies on base classes or, for member functions, on containing classes. This is the
auto_class
parameter forcdxcore.version.version()
.
- versionstr | None, default
- Returns:
- Decorated F: Callable
A decorator
cache(F)
whose__call__
implements the cached call toF
.This callable has a member
cache_info
of typecdxcore.subdir.CacheInfo
which can be used to access information on caching activity.Information available at any time after decoration:**
F.cache_info.name
: qualified name of the functionF.cache_info.signature
: signature of the function
Additonal information available during a call to a decorated function F, and thereafter:
F.cache_info.version
: unique version string reflecting all dependencies.F.cache_info.filename
: unique filename used for caching logic during the last function call.F.cache_info.label
: last label generated, orNone
.F.cache_info.arguments
: arguments parsed to create a unique call ID, orNone
.
Additonal information available after a call to
F
:F.cache_info.last_cached
: whether the last function call returned a cached object.
The decorated
F()
has additional function parameters, namely:override_cache_mode
:CacheMode
| None, defaultNone
Allows overriding the
CacheMode
temporarily, in particular you can set it to"off"
.track_cached_files
:cdxcore.subdir.CacheTracker
| None, defaultNone
Allows passing a
cdxcore.subdir.CacheTracker
object to keep track of all files used (loaded from or saved to). The functioncdxcore.subdir.CacheTracker.delete_cache_files()
can be used to delete all files involved in caching.return_cache_uid
: bool, defaultFalse
If
True
, then the decorated function will return a tupleuid, result
whereuid
is the unique filename generated for this function call, and whereresult
is the actual result from the function, cached or not.Usage:
from cdxcore.subdir import SubDir cache_dir = SubDir("!/.cache") @cache_dir.cache() def f(x, y): return x*y uid, xy = f( x=1, y=2, return_cache_uid=True )
This pattern is thread-safe when compared to using:
xy = f( x=1, y=2 ) uid = f.cache_info.filename
- cache_class(version=None, *, name=None, dependencies=None, version_auto_class=True)[source]#
Short-cut for
cdxcore.subdir.SubDir.cache()
applied to classes with a reduced number of available parameters.Example:
cache = SubDir("!/.cache") @cache.cache_class("0.1") class A(object): @cache.cache_init(exclude_args=['debug']) def __init__(self, x, debug): if debug: print("__init__",x) self.x = x
- property cache_controller#
Returns an assigned
cdxcore.subdir.CacheController
, orNone
- cache_init(label=None, uid=None, exclude_args=None, include_args=None, exclude_arg_types=None)[source]#
Short-cut for
cdxcore.subdir.SubDir.cache()
applied to decorating__init__
with a reduced number of available parameters.Example:
cache = SubDir("!/.cache") @cache.cache_class("0.1") class A(object): @cache.cache_init(exclude_args=['debug']) def __init__(self, x, debug): if debug: print("__init__",x) self.x = x
- property cache_mode#
Returns the
cdxcore.subdir.CacheMode
associated with the underlying cache controller
- delete(file, raise_on_error=False, *, ext=None)[source]#
Deletes
file
.This function will quietly fail if
file
does not exist unlessraise_on_error
is set toTrue
.- Parameters:
- file
filename, or list of filenames
- raise_on_errorbool, default
False
If
False
, do not throwKeyError
if file does not exist or another error occurs.- extstr | None, default
None
Extension, or list thereof if
file
is a list.Use
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- delete_all_content(delete_self=False, raise_on_error=False, *, ext=None)[source]#
Deletes all valid keys and subdirectories in this sub directory.
Does not delete files with other extensions. Use
cdxcore.subdir.SubDir.delete_everything()
if the aim is to delete, well, everything.- Parameters:
- delete_self: bool
Whether to delete the directory itself as well, or only its contents.
- raise_on_error: bool
False
for silent failure- extstr | None, default
None
Extension for keys, or
None
for the directory’s default. Use""
to match all files regardless of extension.
- delete_all_files(raise_on_error=False, *, ext=None)[source]#
Deletes all valid keys in this sub directory with the correct extension.
- Parameters:
- raise_on_errorbool
Set to
False
to quietly ignore errors.- extstr | None, default
None
Extension to be used:
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- delete_everything(keep_directory=True)[source]#
Deletes the entire sub directory will all contents.
WARNING: deletes all files and sub-directories, not just those with the present extension. If
keep_directory
isFalse
, the directory referred to by this object will also be deleted. In this case,self
will be set toNone
.
- property existing_path: str#
Return current path, including training
'/'
.existing_path
ensures that the directory structure exists (or raises an exception). Usecdxcore.subdir.SubDir.path()
if creation on the fly is not desired.
- exists(file, *, ext=None)[source]#
Checks whether a file exists.
- Parameters:
- file
Filename, or list of filenames.
- extstr | None, default
None
Extension to be used:
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- Returns:
- Statusbool
If
file
is a string, returnsTrue
orFalse
, else it will return a list ofbool
values.
- static expand_std_root(name)[source]#
Expands
name
by a standardized root directory if provided:The first character of
name
can be either of:"!"
returnscdxcore.subdir.SubDir.temp_dir()
."."
returnscdxcore.subdir.SubDir.working_dir()
."~"
returnscdxcore.subdir.SubDir.user_dir()
.
If neither of these matches the first character,
name
is returned as is.This function does not support
"?"
.
- property ext: str#
Returns the common extension of the files in this directory, including leading
'.'
. Resolves"*"
into the extension associated with the currentcdxcore.subdir.Format
.
- file_size(file, *, ext=None)[source]#
Returns the file size of a file.
See comments on
os.path.getatime()
for system compatibility information.- Parameters:
- filestr
Filename, or list of filenames.
- extstr
Extension, or list thereof if
file
is an extension.Use
None
for the directory default.Use
""
for no automatic extension.
- Returns:
- File size if
file
, orNone
if an error occured.
- File size if
- files(*, ext=None)[source]#
Returns a list of files in this subdirectory with the current extension, or the specified extension.
In other words, if the extension is “.pck”, and the files are “file1.pck”, “file2.pck”, “file3.bin” then this function will return [ “file1”, “file2” ]
If
ext
is:None
, then the directory’s default extension will be used.""
then this function will return all files in this directory."*"
then the extension corresponding to the current format will be used.
This function ignores directories. Use
cdxcore.subdir.SubDir.sub_dirs()
to retrieve those.
- property fmt: Format#
Returns current
cdxcore.subdir.Format
.
- full_file_name(file, *, ext=None)[source]#
Returns fully qualified file name.
The function tests that
file
does not contain directory information.- Parameters:
- filestr
Core file name without path or extension.
- extstr | None, default
None
If not
None
, use this extension rather thancdxcore.subdir.SubDir.ext
.
- Returns:
- Filenamestr | None
Fully qualified system file name. If
self
isNone
, then this function returnsNone
; iffile
isNone
then this function also returnsNone
.
- full_temp_file_name(file=None, *, ext=None, create_directory=False)[source]#
Returns a fully qualified unique temporary file name with path and extension
The file name is generated by applying a unique hash to the current directory,
file
, the current process and thread IDs, anddatetime.datetime.now()
.If
file
is notNone
it will be used as a label.This function returns the fully qualified file name. Use
cdxcore.subdir.SubDir.temp_file_name()
to only a file name.- Parameters:
- filestr | None, default
None
An optional file. If provided,
cdxcore.uniquehash.named_unique_filename48_8()
is used to generate the temporary file which means that a portion offile
will head the returned temporary name.If
file
isNone
,cdxcore.uniquehash.unique_hash48()
is used to generate a 48 character hash.- extstr | None, default
None
Extension to use, or
None
for the extrension ofself
.
- filestr | None, default
- Returns:
- Temporary file namestr
The fully qualified file name.
- get_creation_time(file, *, ext=None)[source]#
Returns the creation time of a file.
See comments on
os.path.getctime()
for system compatibility information.- Parameters:
- file
Filename, or list of filenames.
- extstr | None, default
None
Extension to be used:
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- Returns:
- Datetime
datetime.datetime
A single
datetime
iffile
is a string, otherwise a list ofdatetime
’s. ReturnsNone
if an error occured.
- Datetime
- get_last_access_time(file, *, ext=None)[source]#
Returns the last access time of a file.
See comments on
os.path.getatime()
for system compatibility information.- Parameters:
- file
Filename, or list of filenames.
- extstr | None, default
None
Extension to be used:
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- Returns:
- Datetime
datetime.datetime
A single
datetime
iffile
is a string, otherwise a list ofdatetime
’s. ReturnsNone
if an error occured.
- Datetime
- get_last_modification_time(file, *, ext=None)[source]#
Returns the last modification time a file.
See comments on
os.path.getmtime()
for system compatibility information.- Parameters:
- file
Filename, or list of filenames.
- extstr | None, default
None
Extension to be used:
None
for the directory default.""
to not use an automatic extension."*"
to use the extension associated with the format of the directory.
- Returns:
- Datetime
datetime.datetime
A single
datetime
iffile
is a string, otherwise a list ofdatetime
’s. ReturnsNone
if an error occured.
- Datetime
- get_version(file, raise_on_error=False, *, ext=None, fmt=None)[source]#
Returns a version stored in a file.
This requires that the file has previously been saved with a version. Otherwise this function will have unpredictable results.
- Parameters:
- filestr
A filename, or a list thereof.
- raise_on_errorbool
Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.
- delete_wrong_versionbool, default
True
If
True
, and if a wrong version was found, deletefile
.- extstr | None, default
None
Extension overwrite, or a list thereof if
file
is a list.Set to:
None
to use directory’s default."*"
to use the extension implied byfmt
.""
for no extension.
- fmt
cdxcore.subdir.Format
| None, defaultNone
File format or
None
to use the directory’s default. Note thatfmt
cannot be a list even iffile
is.
- Returns:
- versionstr
The version.
- property is_none: bool#
Whether this object is
None
or not. For suchSubDir
object no files exists, and writing any file will fail.
- is_version(file, version=None, raise_on_error=False, *, ext=None, fmt=None, delete_wrong_version=True)[source]#
Tests the version of a file.
- Parameters:
- filestr
A filename, or a list thereof.
- versionstr
Specifies the version to compare the file’s version with.
You can use
"*"
to match any version.- raise_on_errorbool
Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.
- delete_wrong_versionbool, default
True
If
True
, and if a wrong version was found, deletefile
.- extstr | None, default
None
Extension overwrite, or a list thereof if
file
is a list.Set to:
None
to use directory’s default."*"
to use the extension implied byfmt
.""
for no extension.
- fmt
cdxcore.subdir.Format
| None, defaultNone
File format or
None
to use the directory’s default. Note thatfmt
cannot be a list even iffile
is.
- Returns:
- Statusbool
Returns
True
only if the file exists, has version information, and its version is equal toversion
.
- items(*, ext=None, raise_on_error=False)[source]#
Dictionary-style iterable of filenames and their content.
Usage:
subdir = SubDir("!") for file, data in subdir.items(): print( file, str(data)[:100] )
- Parameters:
- extstr | None, default
None
Extension or
None
for the directory’s current extension. Use""
for all file extension.
- extstr | None, default
- Returns:
- Iterable
An iterable generator
- property path: str#
Return current path, including trailing
'/'
.Note that the path may not exist yet. If existence is required, consider using
cdxcore.subdir.SubDir.existing_path()
.
- read(file, default=None, raise_on_error=False, *, version=None, delete_wrong_version=True, ext=None, fmt=None)[source]#
Read data from a file if the file exists, or return
default
.Supports
file
containing directory information.Supports
file
(anddefault``as well as ``ext
) being iterable. Examples:from cdxcore.subdir import SubDir files = ['file1', 'file2'] sd = SubDir("!/test") sd.read( files ) # both files are using default None sd.read( files, 1 ) # both files are using default '1' sd.read( files, [1,2] ) # files use defaults 1 and 2, respectively sd.read( files, [1] ) # produces error as len(keys) != len([1])
Strings are iterable but are treated as single value. Therefore:
sd.read( files, '12' ) # the default value '12' is used for both files sd.read( files, ['1','2'] ) # use defaults '1' and '2', respectively
- Parameters:
- filestr
A file name or a list thereof.
file
may contain subdirectories.- default
Default value, or default values if
file
is a list.- raise_on_errorbool, default
False
Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns the default.
- versionstr | None, default
None
If not
None
, specifies the version of the current code base.In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a
cdxcore.version.VersionError
exception).You can specify version
"*"
to accept any version. Note that this is distinct to usingNone
which stipulates that the file should not have version information.- delete_wrong_versionbool, default
True
If
True
, and if a wrong version was found, delete the file.- extstr | None, default
None
Extension overwrite, or a list thereof if
file
is a list.Use:
None
to use directory’s default.'*'
to use the extension implied byfmt
.""
to turn of extension management.
- fmt
cdxcore.subdir.Format
| None, defaultNone
File
cdxcore.subdir.Format
orNone
to use the directory’s default.Note:
fmt
cannot be a list even iffile
is.Unless
ext
or the SubDir’s extension is'*'
, changing the format does not automatically change the extension.
- Returns:
- Contenttype | list
For a single
file
returns the content of the file if successfully read, ordefault
otherwise. Iffile`
is a list, this function returns a list of contents.
- Raises:
- Version error
cdxcore.version.VersionError
: If the file’s version did not match the
version
provided.- Version present
cdxcore.subdir.VersionPresentError
: When attempting to read a file without
version
which has a version this exception is raised.- I/O errors
Exception
Various standard I/O errors are raisedas usual.
- Version error
- read_string(file, default=None, raise_on_error=False, *, ext=None)[source]#
Reads text from a file. Removes trailing EOLs.
Returns the read string, or a list of strings if
file
was iterable.
- static remove_bad_file_characters(file, by='default')[source]#
Replaces invalid characters in a filename using the map
by
.See
cdxcore.util.fmt_filename()
for documentation and further options.
- rename(source, target, *, ext=None)[source]#
Rename a file.
This function will raise an exception if not successful.
- Parameters:
- source, targetstr
Filenames.
- extstr
Extension.
Use
None
for the directory default.Use
""
for no automatic extension.
- sub_dirs()[source]#
Retrieve a list of all sub directories.
If
self
does not refer to an existing directory, then this function returns an empty list.
- static temp_dir()[source]#
Return system temp directory. Short-cut to
tempfile.gettempdir()
. Result contains trailing'/'
.
- temp_file_name(file=None)[source]#
Returns a unique temporary file name.
The file name is generated by applying a unique hash to the current directory,
file
, the current process and thread IDs, anddatetime.datetime.now()
.If
file
is notNone
it will be used as a label.This function returns just the file name. Use
cdxcore.subdir.SubDir.full_temp_file_name()
to get a full temporary file name including path and extension.- Parameters:
- filestr | None, default
None
An optional file. If provided,
cdxcore.uniquehash.named_unique_filename48_8()
is used to generate the temporary file which means that a portion offile
will head the returned temporary name.If
file
isNone
,cdxcore.uniquehash.unique_hash48()
is used to generate a 48 character hash.
- filestr | None, default
- Returns:
- Temporary file namestr
The file name.
- static temp_temp_dir()[source]#
Return a temporary temp directory name using
tempfile.mkdtemp()
. Noet that this function will return a different directory upon every function call.It is strongly recommended to clean up after usage, for example using the pattern:
from cdxcore.subdir import SubDir import shutil try: tmp_dir = SubDir.temp_temp_dir() ... finally: shutil.rmtree(tmp_dir)
Result contains trailing
'/'
.
- static user_dir()[source]#
Return current working directory. Short-cut for
os.path.expanduser()
with parameter' '
. Result contains trailing'/'
.
- static working_dir()[source]#
Return current working directory. Short-cut for
os.getcwd()
. Result contains trailing'/'
.
- write(file, obj, raise_on_error=True, *, version=None, ext=None, fmt=None)[source]#
Writes an object to file.
Supports
file
containing directories.Supports
file
being a list. In this case, ifobj
is an iterable it is considered the list of values for the elements offile
. Ifobj
is not iterable, it will be written into all files fromfile
:from cdxcore.subdir import SubDir keys = ['file1', 'file2'] sd = SubDir("!/test") sd.write( keys, 1 ) # works, writes '1' in both files. sd.write( keys, [1,2] ) # works, writes 1 and 2, respectively sd.write( keys, "12" ) # works, writes '12' in both files sd.write( keys, [1] ) # produces error as len(keys) != len(obj)
If the current directory is
None
, then the function raises anEOFError
exception.- Parameters:
- filestr
Core filename, or list thereof.
- obj
Object to write, or list thereof if
file
is a list.- raise_on_errorbool
If
False
, this function will returnFalse
upon failure.- versionstr | None, default
None
If not
None
, specifies the version of the code which generatedobj
. This version will be written to the beginning of the file.- extstr | None, default
None
Extension, or list thereof if
file
is a list.Use
None
to use directory’s default extension.Use
"*"
to use the extension implied byfmt
.
- fmt
cdxcore.subdir.Format
| None, defaultNone
File format or
None
to use the directory’s default. Note thatfmt
cannot be a list even iffile
is. Note that unlessext
or the SubDir’s extension is ‘*’, changing the format does not automatically change the extension used.
- Returns:
- Successbool
Boolean to indicate success if
raise_on_error
isFalse
.
- write_string(file, line, raise_on_error=True, *, ext=None)[source]#
Writes a line of text into a file.
Supports
file`
containing directories.Supports
file`
being a list. In this case,line
can either be the same value for all file’s or a list, too.
If the current directory is
None
, then the function throws an EOFError exception
- exception cdxcore.subdir.VersionPresentError[source]#
Bases:
RuntimeError
Exception raised in case a file was read which had a version, but no test version was provided.
- cdxcore.subdir.VersionedCacheRoot(directory, *, ext=None, fmt=None, create_directory=False, **controller_kwargs)[source]#
Create a root directory for versioned caching on disk using
cdxcore.subdir.SubDir.cache()
.Usage:
In a central file, define a root directory for all caching activity:
from cdxcore.subdir import VersionedCacheRoot vroot = VersionedCacheRoot("!/cache")
Create sub-directories as suitable, for example:
vtest = vroot("test")
Use these for caching:
@vtest.cache("1.0") def f1( x=1, y=2 ): print(x,y) @vtest.cache("1.0", dps=[f1]) def f2( x=1, y=2, z=3 ): f1( x,y ) print(z)
- Parameters:
- directorystr
Name of the root directory for caching.
Using SubDir the following Short-cuts are supported:
"!/dir"
createsdir
in the temporary directory."~/dir"
createsdir
in the home directory."./dir"
createsdir
relative to the current directory.
- extstr | None, default
None
Extension, which will automatically be appended to file names. The default value depends on
fmt`; for ``Format.PICKLE
it is “pck”.- fmt
cdxcore.subdir.Format
| None, defaultNone
File format; if
ext
is not specified, the format drives the extension, too. The defaultNone
becomesFormat.PICKLE
.- create_directorybool, default
False
Whether to create the directory upon creation.
- controller_kwargs: dict
Parameters passed to
cdxcore.subdir.CacheController`
.Common parameters used:
exclude_arg_types
: list of types or names of types to exclude when auto-generating function signatures from function arguments. An example iscdxcore.verbose.Context
which is used to print progress messages.max_filename_length
: maximum filename length.hash_length
: length used for hashes, seecdxcore.uniquehash.UniqueHash
.debug_verbose
set toContext.all
after importingfrom cdxcore.verbose import Context
will turn on tracing all caching operations.
- Returns:
- Root
cdxcore.subdir.SubDir
A root directory suitable for caching.
- Root