cdxcore.subdir#

Utilities for file i/o, directory management and streamlined versioned caching.

Overview#

The key idea is to provide transparent, concise pickle access to the file system via the cdxcore.subdir.SubDir class.

Key design features:

  • Simple path construction via () operator. By default directories which do not exist yet are only created upon writing a first file.

  • Files managed by cdxcore.subdir.SubDir all have the same extension.

  • Files support “fast versioning”: the version of a file can be read without having to read the entire file.

  • cdxcore.subdir.SubDir.cache() implements a convenient versioned caching framework.

Directories#

The core of the framework is the cdxcore.subdir.SubDir class which represents a directory with files of a given extension.

Simply write:

from cdxcore.subdir import SubDir
subdir = SubDir("my_directory")      # relative to current working directory
subdir = SubDir("./my_directory")    # relative to current working directory
subdir = SubDir("~/my_directory")    # relative to home directory
subdir = SubDir("!/my_directory")    # relative to default temp directory
subdir = SubDir("?!/my_directory")   # relative to a temporary temp directory; this directory will be cleared upon (orderly) exit of ``SubDir``.

Note that my_directoy will not be created if it does not exist yet. It will be created the first time we write a file.

You can specify a parent for relative path names:

from cdxcore.subdir import SubDir
subdir = SubDir("my_directory", "~")      # relative to home directory
subdir = SubDir("my_directory", "!")      # relative to default temp directory
subdir = SubDir("my_directory", ".")      # relative to current directory
subdir2 = SubDir("my_directory", subdir)  # subdir2 is relative to `subdir`

Change the extension to “bin”:

from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory;*.bin")     
subdir = SubDir("~/my_directory", ext="bin")    
subdir = SubDir("my_directory", "~", ext="bin")    

You can turn off extension management by setting the extension to "":

from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory", ext="")

You can also use cdxcore.subdir.SubDir.__call__() to generate sub directories:

from cdxcore.subdir import SubDir
parent = SubDir("~/parent")
subdir = parent("subdir")

Be aware that when the operator cdxcore.subdir.SubDir.__call__() is called with two keyword arguments, then it reads files.

You can obtain a list of all sub directories in a directory by using cdxcore.subdir.SubDir.sub_dirs(). The list of files with the corresponding extension is accessible via cdxcore.subdir.SubDir.files().

File Format#

cdxcore.subdir.SubDir supports file i/o with a number of different file formats:

  • “PICKLE”: standard pickling with default extension “pck”.

  • “JSON_PICKLE”: uses the jsonpickle package; default extension “jpck”. The advantage of this format over “PICKLE” is that it is somewhat human-readable. However, jsonpickle uses compressed formats for complex objects such as numpy arrays, hence readablility is somewhat limited. Using “JSON_PICKLE” comes at cost of slower i/o speed.

  • “JSON_PLAIN”: calls cdxcore.util.plain() is an output-only format to generate human readable files which (usually) cannot be loaded back from disk. In this mode SubDir converts objects into plain Python objects before using json to write them to disk. That means that deserialized data does not have the correct object structure for being restored properly. However, such files are much easier to read.

  • “BLOSC” uses blosc to read/write compressed binary data. The blosc compression algorithm is very fast, hence using this mode will not usually lead to notably slower performance than using “PICKLE” but will generate smaller files, depending on your data structure. The default extension for “BLOSC” is “zbsc”.

  • “GZIP”: uses gzip to to read/write compressed binary data. The default extension is “pgz”.

Summary of properties:

Format

Restores objects

Human readable

Speed

Compression

Extension

PICKLE

yes

no

high

no

.pck

JSON_PLAIN

no

yes

low

no

.json

JSON_PICKLE

yes

limited

low

no

.jpck

BLOSC

yes

no

high

yes

.zbsc

GZIP

yes

no

high

yes

.pgz

You may specify the file format when instantiating cdxcore.subdir.SubDir:

from cdxcore.subdir import SubDir
subdir = SubDir("~/my_directory", fmt=SubDir.PICKLE)
subdir = SubDir("~/my_directory", fmt=SubDir.JSON_PICKLE)
...

If ext is not specified the extension will defaulted to the respective default extension of the format requested.

Reading Files#

To read the data contained in a file from our subdirectory with its reference extension use cdxcore.subdir.SubDir.read():

from cdxcore.subdir import SubDir
subdir = SubDir("!/test")

data = subdir.read("file")                 # returns the default `None` if file.pck is not found
data = subdir.read("file", default=[])     # returns the default [] if file.pck is not found

This function will return the “default” (which in turns defaults to None) if “file.pck” does not exist. You can opt to raise an error instead of returning a default by using raise_on_error=True:

data = subdir.read("file", raise_on_error=True)  # raises 'KeyError' if not found

When calling read() you may specify an alternative extension:

data = subdir.read("file", ext="bin")     # change extension to "bin"
data = subdir.read("file.bin", ext="")    # no automatic extension

Specifying a different format for cdxcore.subdir.SubDir.read() only changes the extension automatically if you have not overwritten it before:

subdir = SubDir("!/test")                              # default format PICKLE with extension pck
data   = subdir.read("file", fmt=Subdir.JSON_PICKLE )  # uses "json" extension

subdir = SubDir("!/test", ext="bin")                   # user-specified extension
data   = subdir.read("file", fmt=Subdir.JSON_PICKLE )  # keeps using "bin"

You can also use the cdxcore.subdir.SubDir.__call__() to read files, in which case you must specify a default value (if you don’t, then the operator will return a sub directory):

data = subdir("file", None)   # returns None if file is not found

You can also use item notation to access files. In this case, though, an error will be thrown if the file does not exist:

data = subdir['file']   # raises KeyError if file is not found

You can read a range of files in one function call:

data = subdir.read( ["file1", "file2"] )   # returns list

Finally, you can also iterate through all existing files using iterators:

# manual loading
for file in subdir:
    data = subdir.read(file)
    ...

# automatic loading, with "None" as a default
for file, data in subdir.items():
    ...

To obtain a list of all files in our directory which have the correct extension, use cdxcore.subdir.SubDir.files().

Writing Files#

Writing files mirrors reading them:

from cdxcore.subdir import SubDir
subdir = SubDir("!/test")

subdir.write("file", data)
subdir['file'] = data

You may specifify different a extension:

subdir.write("file", data, ext="bin")

You can also specify a file cdxcore.subdir.Format. The extension will be changed automatically if you have not set it manually:

subdir = SubDir("!/test")
subdir.write("file", data, fmt=SubDir.JSON_PICKLE )   # will write to "file.json"

To write several files at once, write:

subdir.write(["file1", "file"], [data1, data2])

Note that when writing to a file, cdxcore.subdir.SubDir.write() will first write to a temporary file, and then rename this file into the target file name. The temporary file name is generated by applying cdxcore.uniquehash.unique_hash48() to the target file name, current time, process and thread ID, as well as the machines’s UUID. This is done to reduce collisions between processes/machines accessing the same files, potentially accross a network. It does not remove collision risk entirely, though.

Filenames#

cdxcore.subdir.SubDir transparently handles directory access and extensions. That means a user usually only uses file names which do not contain either. To obtain the full qualified filename given a “file” use cdxcore.subdir.SubDir.full_file_name().

Reading and Writing Versioned Files#

cdxcore.subdir.SubDir supports versioned files. If versions are used, then they must be used for both reading and writing. cdxcore.version.version() provides a standard decorator framework for definining versions for classes and functions including version dependencies.

If a version is provided for cdxcore.subdir.SubDir.write() then SubDir will write the version in a block ahead of the main content of the file. In case of the PICKLE format, this is a byte string. In case of JSON_PLAIN and JSON_PICKLE this is line of text starting with # ahead of the file. (Note that this violates the JSON file format.)

Writing a short version block ahead of the main data allows cdxcore.subdir.SubDir.read() to read this version information back quickly without reading the entire file. read() does attempt so if its called with a version parameter. In this case it will compare the read version with the provided version, and only return the main content of the file if versions match.

Use cdxcore.subdir.SubDir.is_version() to check whether a given file has a specific version. Like read() this function only reads the information required to obtain the information and will be much faster than reading the whole file.

Important: if a file was written with a version, then it has to be read again with a test version. You can specify version="*" for cdxcore.subdir.SubDir.read() to match any version.

Examples:

Writing a versioned file:

from cdxcore.subdir import SubDir
sub_dir = SubDir("!/test_version)
sub_dir.write("test", [1,2,3], version="0.0.1" )

To read [1,2,3] from “test” we need to use the correct version:

_ = sub_dir.read("test", version="0.0.1") 

The following will not read “test” as the versions do not match:

_ = sub_dir.read("test", version="0.0.2")

By default cdxcore.subdir.SubDir.read() will not fail if a version mismatch is encountered; rather it will attempt to delete the file and then return the default value.

This can be turned off with the keyword delete_wrong_version set to False.

You can ignore the version used to writing a file by using "*" as version:

_ = sub_dir.read("test", version="*")

Note that reading files which have been written with a version without version keyword will fail because SubDir will only append additional version information to the file if required.

Test existence of Files#

To test existence of ‘file’ in a directory, use one of:

subdir.exist('file')
'file' in subdir

Deleting files#

To delete a ‘file’, use any of the following:

subdir.delete("file")
del subdir['file']

All of these are silent, and will not throw errors if file does not exist. In order to throw an error use:

subdir.delete('file', raise_on_error=True)

A few member functions assist in deleting a number of files:

Caching#

A cdxcore.subdir.SubDir object offers an advanced context for caching calls to collection.abc.Callable` objects with cdxcore.subdir.SubDir.cache().

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache("0.1")
def f(x,y):
    return x*y

_ = f(1,2)    # function gets computed and the result cached
_ = f(1,2)    # restore result from cache
_ = f(2,2)    # different parameters: compute and store result

This involves keying the cache by the function name and its current parameters using cdxcore.uniquehash.UniqueHash, and monitoring the functions version using cdxcore.version.version(). The caching behaviour itself can be controlled by specifying the desired cdxcore.subdir.CacheMode.

See cdxcore.subdir.SubDir.cache() for full feature set.

Import#

import cdxcore.uniquehash as uniquehash

Documentation#

Functions

VersionedCacheRoot(directory, *[, ext, fmt, ...])

Create a root directory for versioned caching on disk using cdxcore.subdir.SubDir.cache().

Classes

CacheCallable(subdir, *[, version, label, ...])

Wrapper for a cached function.

CacheController(*[, exclude_arg_types, ...])

Central control parameters for caching.

CacheInfo(name, F, keep_last_arguments)

Information on cfunctions decorated with cdxcore.subdir.SubDir.cache().

CacheMode([mode])

A class which encodes standard behaviour of a caching strategy.

CacheTracker()

Utility class to track caching and be able to delete all dependent objects.

Format(*values)

File formats for cdxcore.subdir.SubDir.

SubDir(name[, parent, ext, fmt, ...])

SubDir implements a transparent i/o interface for storing data in files.

Exceptions

VersionPresentError

Exception raised in case a file was read which had a version, but no test version was provided.

class cdxcore.subdir.CacheCallable(subdir, *, version=None, dependencies, label=None, uid=None, name=None, exclude_args=None, include_args=None, exclude_arg_types=None, version_auto_class=True, name_of_name_arg='name')[source]#

Bases: object

Wrapper for a cached function.

This is the wrapper returned by cdxcore.subdir.SubDir.cache().

Attributes:
cache_controller

Returns the cdxcore.subdir.CacheController

cache_mode

Returns the cdxcore.subdir.CacheMode of the underlying cdxcore.subdir.CacheController

debug_verbose

Returns the debug cdxcore.verbose.Context used to print caching information, or None

global_exclude_arg_types

Returns exclude_arg_types of the underlying cdxcore.subdir.CacheController

labelledFileName

Returns labelledFileName() of the underlying cdxcore.subdir.CacheController

uid_or_label

ID or label

unique

Whether the ID is unique

uniqueFileName

Returns uniqueFileName() of the underlying cdxcore.subdir.CacheController

Methods

__call__(F)

Decorate F as cachable callable.

__call__(F)[source]#

Decorate F as cachable callable. See cdxcore.subdir.SubDir.cache() for documentation.

property cache_controller: CacheController#

Returns the cdxcore.subdir.CacheController

property cache_mode: CacheMode#

Returns the cdxcore.subdir.CacheMode of the underlying cdxcore.subdir.CacheController

property debug_verbose: Context#

Returns the debug cdxcore.verbose.Context used to print caching information, or None

property global_exclude_arg_types: list[type]#

Returns exclude_arg_types of the underlying cdxcore.subdir.CacheController

property labelledFileName: Callable#

Returns labelledFileName() of the underlying cdxcore.subdir.CacheController

property uid_or_label: Callable#

ID or label

property unique: bool#

Whether the ID is unique

property uniqueFileName: Callable#

Returns uniqueFileName() of the underlying cdxcore.subdir.CacheController

class cdxcore.subdir.CacheController(*, exclude_arg_types=[<class 'cdxcore.verbose.Context'>], cache_mode='on', max_filename_length=48, hash_length=8, debug_verbose=None, keep_last_arguments=False)[source]#

Bases: object

Central control parameters for caching.

When a parameter object of this type is assigned to a cdxcore.subdir.SubDir, then it is passed on when sub-directories are created. This way all SubDir have the same caching behaviour.

See cdxcore.subdir.CacheController for a list of control parameters.

Parameters:
exclude_arg_typeslist[type], optional

List of types to exclude from producing unique ids from function arguments.

Defaults to [Context].

cache_modeCacheMode, default ON

Top level cache control. Set to “OFF” to turn off all caching.

max_filename_lengthint, default 48

Maximum filename length. If unique id’s exceed the file name a hash of length hash_length will be intergated into the file name. See cdxcore.uniquehash.NamedUniqueHash.

hash_lengthint, default 8

Length of the hash used to make sure each filename is unique See cdxcore.uniquehash.NamedUniqueHash.

debug_verbosecdxcore.verbose.Context | None, default None

If not None print caching process messages to this object.

keep_last_argumentsbool, default False

Keep a dictionary of all parameters as string representations after each function call. If the function F was decorated using :meth:cdxcore.subdir.SubDir.cache, you can access this information via F.cache_info.last_arguments.

Note that strings are limited to 100 characters per argument to avoid memory overload when large objects are passed.

class cdxcore.subdir.CacheInfo(name, F, keep_last_arguments)[source]#

Bases: object

Information on cfunctions decorated with cdxcore.subdir.SubDir.cache().

Functions decorated with cdxcore.subdir.SubDir.cache() will have a member cache_info of this type

arguments#

Last arguments used. This member is only present if keep_last_arguments was set to True for the relevant cdxcore.subdir.CacheController.

filename#

Unique filename of the last function call.

label#

Label of the last function call.

last_cached#

Whether the last function call restored data from disk.

name#

Decoded name of the function.

signature#

inspect.signature() of the function.

version#

Last version used.

class cdxcore.subdir.CacheMode(mode=None)[source]#

Bases: object

A class which encodes standard behaviour of a caching strategy.

Summary mechanics:

Action

on

gen

off

update

clear

readonly

load cache from disk if exists

x

x

x

write updates to disk

x

x

x

delete existing object

x

delete existing object if incompatible

x

x

x

Standard Caching Semantics

Assuming we wish to cache results from calling a function f in a file named filename in a directory directory, then this is the CacheMode waterfall:

def cache_f( filename : str, directory : SubDir, version : str, cache_mode : CacheMode ):
    if cache_mode.delete:
        directory.delete(filename)
    if cache_mode.read:
        r = directory.read(filename,
                           default=None,  
                           version=version,
                           raise_on_error=False,
                           delete_wrong_version=cache_mode.del_incomp
                           )
        if not r is None:
            return r

    r = f(...) # compute result

    if cache_mode.write:
        directory.write(filename,
                        r,
                        version=version,
                        raise_on_error=False
                        )

    return r

See cdxcore.subdir.SubDir.cache() for a comprehensive implementation.

Parameters:
modestr, optional

Which mode to use: "on", "gen", "off", "update", "clear" or "readonly".

The default is None in which case "on" is used.

Attributes:
del_incomp

Whether to delete existing data if it is not compatible or has the wrong version.

delete

Whether to delete existing data.

is_clear

Whether this cache mode is CLEAR.

is_gen

Whether this cache mode is GEN.

is_off

Whether this cache mode is OFF.

is_on

Whether this cache mode is ON.

is_readonly

Whether this cache mode is READONLY.

is_update

Whether this cache mode is UPDATE.

read

Whether to load any existing cached data.

write

Whether to cache newly computed data to disk.

CLEAR = 'clear'#
GEN = 'gen'#
HELP = "'on' for standard caching; 'gen' for caching but keep existing incompatible files; 'off' to turn off; 'update' to overwrite any existing cache; 'clear' to clear existing caches; 'readonly' to read existing caches but not write new ones"#

Standard config help text, to be used with cdxcore.config.Config.__call__() as follows:

from cdxcore.config import Config
from cdxcore.subdir import CacheMode

def get_cache_mode( config : Config ) -> CacheMode:
    return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
MODES = ['on', 'gen', 'off', 'update', 'clear', 'readonly']#

List of available modes in text form. This list can be used as cast parameter when calling cdxcore.config.Config.__call__():

from cdxcore.config import Config
from cdxcore.subdir import CacheMode

def get_cache_mode( config : Config ) -> CacheMode:
    return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
OFF = 'off'#
ON = 'on'#
READONLY = 'readonly'#
UPDATE = 'update'#
property del_incomp: bool#

Whether to delete existing data if it is not compatible or has the wrong version.

property delete: bool#

Whether to delete existing data.

property is_clear: bool#

Whether this cache mode is CLEAR.

property is_gen: bool#

Whether this cache mode is GEN.

property is_off: bool#

Whether this cache mode is OFF.

property is_on: bool#

Whether this cache mode is ON.

property is_readonly: bool#

Whether this cache mode is READONLY.

property is_update: bool#

Whether this cache mode is UPDATE.

property read: bool#

Whether to load any existing cached data.

property write: bool#

Whether to cache newly computed data to disk.

class cdxcore.subdir.CacheTracker[source]#

Bases: object

Utility class to track caching and be able to delete all dependent objects.

Methods

delete_cache_files()

Delete all tracked files

delete_cache_files()[source]#

Delete all tracked files

class cdxcore.subdir.Format(*values)[source]#

Bases: Enum

File formats for cdxcore.subdir.SubDir.

Format

Restores objects

Human readable

Speed

Compression

Extension

PICKLE

yes

no

high

no

.pck

JSON_PLAIN

no

yes

low

no

.json

JSON_PICKLE

yes

limited

low

no

.jpck

BLOSC

yes

no

high

yes

.zbsc

GZIP

yes

no

high

yes

.pgz

BLOSC = 3#

blosc binary compressed format.

GZIP = 4#

gzip binary compressed format.

JSON_PICKLE = 1#

jsonpickle format.

JSON_PLAIN = 2#

json format.

PICKLE = 0#

Standard binary pickle format.

class cdxcore.subdir.SubDir(name, parent=None, *, ext=None, fmt=None, create_directory=None, cache_controller=None, delete_everything=False, delete_everything_upon_exit=False)[source]#

Bases: object

SubDir implements a transparent i/o interface for storing data in files.

Directories

Instantiate a SubDir with a directory name. There are some pre-defined relative system paths the name can refer to:

from cdxcore.subdir import SubDir
parent  = SubDir("!/subdir")         # relative to system temp directory
parent  = SubDir("~/subdir")         # relative to user home directory
parent  = SubDir("./subdir")         # relative to current working directory (explicit)
parent  = SubDir("subdir")           # relative to current working directory (implicit)
parent  = SubDir("/tmp/subdir")      # absolute path (linux)
parent  = SubDir("C:/temp/subdir")   # absolute path (windows)
parent  = SubDir("")                 # current working directory

Sub-directories can be generated in a number of ways:

subDir = parent('subdir')              # using __call__
subDir = SubDir('subdir', parent)      # explicit constructor
subDir = SubDir('subdir', parent="!/") # explicit constructor with parent being a string

Files managed by SubDir will usually have the same extension. This extension can be specified with ext, or as part of the directory string:

subDir = SubDir("~/subdir", ext="bin") # set extension to 'bin'
subDir = SubDir("~/subdir;*.bin")      # set extension to 'bin'

Leaving the extension as default None allows SubDir to automatically use the extension associated with any specified format.

Copy Constructor

The constructor is shallow.

File I/O

Write data with cdxcore.subdir.SubDir.write():

subDir.write('item3',item3)          # explicit
subDir['item1'] = item1              # dictionary style

Note that cdxcore.subdir.SubDir.write() can write to multiple files at the same time.

Read data with cdxcore.subdir.SubDir.read():

item = subDir('item', 'i1')          # returns 'i1' if not found.
item = subdir.read('item')           # returns None if not found
item = subdir.read('item','i2')      # returns 'i2' if not found
item = subDir['item']                # raises a KeyError if not found

Treat files in a directory like dictionaries:

for file in subDir:
    data = subDir[file]
    f(item, data)

for file, data in subDir.items():
    f(item, data)

Delete items:

del subDir['item']                    # silently fails if 'item' does not exist
subDir.delete('item')                 # silently fails if 'item' does not exist
subDir.delete('item', True)           # raises a KeyError if 'item' does not exit

Cleaning up:

parent.delete_all_content()        # silently deletes all files with matching extensions, and sub directories.

File Format

SubDir supports a number of file formats via cdxcore.subdir.Format. Those can be controlled with the fmt keyword in various functions not least cdxcore.subdir.SubDir:

subdir = SubDir("!/.test", fmt=SubDir.JSON_PICKLE)

See cdxcore.subdir.Format for supported formats.

Parameters:
namestr:

Name of the directory.

The name may start with any of the following special characters:

  • '.' for current directory.

  • '~' for home directory.

  • '!' for system default temp directory.

  • '?' for a temporary temp directory. In this case delete_everything_upon_exit is always True.

The directory name may also contain a formatting string for defining ext on the fly: for example use "!/test;*.bin" to specify a directory "test" in the user’s temp directory with extension "bin".

The directory name can be set to None in which case it is always empty and attempts to write to it fail with EOFError.

parentstr | SubDir | None, default None

Parent directory.

If parent is a cdxcore.subdir.SubDir then its parameters are used as default values here.

extstr | None, default None

Extension for files managed by this SubDir. All files will share the same extension.

If set to "" no extension is assigned to this directory. That means, for example, that cdxcore.subdir.SubDir.files() returns all files contained in the directory, not just files with a specific extension.

If None, use an extension depending on fmt:

  • ‘pck’ for the default PICKLE format.

  • ‘json’ for JSON_PLAIN.

  • ‘jpck’ for JSON_PICKLE.

  • ‘zbsc’ for BLOSC.

  • ‘pgz’ for GZIP.

fmtcdxcore.subdir.Format | None, default Format.PICKLE

One of the cdxcore.subdir.Format codes. If ext is left to None then setting the a format will also set the corrsponding ext.

create_directorybool | None, default False

Whether to create the directory upon creation of the SubDir object; otherwise it will be created upon first cdxcore.subdir.SubDir.write().

Set to None to use the setting of the parent directory, or False if no parent is specified.

cache_controllercdxcore.subdir.CacheController | None, default None

An object which fine-tunes the behaviour of cdxcore.subdir.SubDir.cache(). See that function’s documentation for further details. Default is None.

delete_everythingbool, default False

Delete all contents in the newly defined sub directory upon creation.

delete_everything_upon_exitbool, default False

Delete all contents of the current exist if self is deleted. This is the always True if the "?/" pretext was used.

Note, however, that this will only be executed once the object is garbage collected.

Default is, for some good reason, False.

Attributes:
cache_controller

Returns an assigned cdxcore.subdir.CacheController, or None

cache_mode

Returns the cdxcore.subdir.CacheMode associated with the underlying cache controller

existing_path

Return current path, including training '/'.

ext

Returns the common extension of the files in this directory, including leading '.'.

fmt

Returns current cdxcore.subdir.Format.

is_none

Whether this object is None or not.

path

Return current path, including trailing '/'.

Methods

Format(*values)

The same as cdxcore.subdir.Format for convenience

__call__(element[, default, raise_on_error, ...])

Read either data from a file, or return a new sub directory.

as_format(format_name)

Converts a named format into the respective format code.

auto_ext([ext_or_fmt])

Computes the effective extension based on theh inputs ext_or_fmt, and the current settings for self.

auto_ext_fmt(*[, ext, fmt])

Computes the effective extension and format based on inputs ext and fmt, each of which defaults to the respective values of self.

cache([version, dependencies, label, uid, ...])

Advanced versioned caching for callables.

cache_class([version, name, dependencies, ...])

Short-cut for cdxcore.subdir.SubDir.cache() applied to classes with a reduced number of available parameters.

cache_init([label, uid, exclude_args, ...])

Short-cut for cdxcore.subdir.SubDir.cache() applied to decorating __init__ with a reduced number of available parameters.

create_directory()

Creates the current directory if it doesn't exist yet.

delete(file[, raise_on_error, ext])

Deletes file.

delete_all_content([delete_self, ...])

Deletes all valid keys and subdirectories in this sub directory.

delete_all_files([raise_on_error, ext])

Deletes all valid keys in this sub directory with the correct extension.

delete_everything([keep_directory])

Deletes the entire sub directory will all contents.

exists(file, *[, ext])

Checks whether a file exists.

expand_std_root(name)

Expands name by a standardized root directory if provided:

file_size(file, *[, ext])

Returns the file size of a file.

files(*[, ext])

Returns a list of files in this subdirectory with the current extension, or the specified extension.

full_file_name(file, *[, ext])

Returns fully qualified file name.

full_temp_file_name([file, ext, ...])

Returns a fully qualified unique temporary file name with path and extension

get_creation_time(file, *[, ext])

Returns the creation time of a file.

get_last_access_time(file, *[, ext])

Returns the last access time of a file.

get_last_modification_time(file, *[, ext])

Returns the last modification time a file.

get_version(file[, raise_on_error, ext, fmt])

Returns a version stored in a file.

is_version(file[, version, raise_on_error, ...])

Tests the version of a file.

items(*[, ext, raise_on_error])

Dictionary-style iterable of filenames and their content.

path_exists()

Whether the current directory exists

read(file[, default, raise_on_error, ...])

Read data from a file if the file exists, or return default.

read_string(file[, default, raise_on_error, ext])

Reads text from a file.

remove_bad_file_characters(file[, by])

Replaces invalid characters in a filename using the map by.

rename(source, target, *[, ext])

Rename a file.

sub_dirs()

Retrieve a list of all sub directories.

temp_dir()

Return system temp directory.

temp_file_name([file])

Returns a unique temporary file name.

temp_temp_dir()

Return a temporary temp directory name using tempfile.mkdtemp().

user_dir()

Return current working directory.

working_dir()

Return current working directory.

write(file, obj[, raise_on_error, version, ...])

Writes an object to file.

write_string(file, line[, raise_on_error, ext])

Writes a line of text into a file.

RETURN_SUB_DIRECTORY

DEFAULT_FORMAT = 0#

Default cdxcore.subdir.Format: Format.PICKLE

class Format(*values)#

Bases: Enum

The same as cdxcore.subdir.Format for convenience

BLOSC = 3#

blosc binary compressed format.

GZIP = 4#

gzip binary compressed format.

JSON_PICKLE = 1#

jsonpickle format.

JSON_PLAIN = 2#

json format.

PICKLE = 0#

Standard binary pickle format.

__call__(element, default=<class 'cdxcore.subdir.SubDir.__RETURN_SUB_DIRECTORY'>, raise_on_error=False, *, version=None, ext=None, fmt=None, delete_wrong_version=True, create_directory=None)[source]#

Read either data from a file, or return a new sub directory.

If only the element argument is used, then this function returns a new sub directory named element.

If both element and default arguments are used, then this function attempts to read the file element from disk, returning default if it does not exist.

Assume we have a subdirectory sd:

from cdxcore.subdir import SubDir
sd  = SubDir("!/test")

Reading files:

x   = sd('file', None)                   # reads 'file' with default value None
x   = sd('sd/file', default=1)           # reads 'file' from sub directory 'sd' with default value 1
x   = sd('file', default=1, ext="tmp")   # reads 'file.tmp' with default value 1

Create sub directory:

sd2 = sd("subdir")                       # creates and returns handle to subdirectory 'subdir'
sd2 = sd("subdir1/subdir2")              # creates and returns handle to subdirectory 'subdir1/subdir2'
sd2 = sd("subdir1/subdir2", ext=".tmp")  # creates and returns handle to subdirectory 'subdir1/subdir2' with extension "tmp"
sd2 = sd(ext=".tmp")                     # returns handle to current subdirectory with extension "tmp"
Parameters:
elementstr

File or directory name, or a list thereof.

defaultoptional

If specified, this function reads element with read( element, default, *args, **kwargs ).

If default is not specified, then this function returns a new sub-directory by calling SubDir(element,parent=self,ext=ext,fmt=fmt).

create_directorybool, default None

When creating sub-directories:

Whether or not to instantly create the sub-directory. The default, None, is to inherit the behaviour from self.

raise_on_errorbool, default False

When reading files:

Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns default.

versionstr | None, default None

When reading files:

If not None, specifies the version of the current code base.

In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a cdxcore.version.VersionError exception).

You can specify version "*" to accept any version. Note that this is distinct to using None which stipulates that the file should not have version information.

delete_wrong_versionbool, default True.

When reading files:

If True, and if a wrong version was found, delete the file.

extstr | None, default is None.

When reading files:

Extension to be used, or a list thereof if element is a list. Defaults to the extension of self.

Semantics:

  • None to use the default extension of self.

  • "*" to use the extension implied by fmt.

  • "" to turn off extension management.

When creating sub-directories:

Extension for the new subdirectory; set to None to inherit the parent’s extension.

fmtcdxcore.subdir.Format | None, default None

When reading files:

File format or None to use the directory’s default. Note that fmt cannot be a list even if element is. Unless ext or the SubDir’s extension is "*", changing the format does not automatically change the extension.

When creating sub-directories:

Format for the new sub-directory; set to None to inherit the parent’s format.

Returns:
Objecttype | SubDir

Either the value in the file, a new sub directory, or lists thereof.

static as_format(format_name)[source]#

Converts a named format into the respective format code.

Example:

format = SubDir.as_format( config("format", "pickle", SubDir.FORMAT_NAMES, "File format") )    
auto_ext(ext_or_fmt=None)[source]#

Computes the effective extension based on theh inputs ext_or_fmt, and the current settings for self.

If ext_or_fmt is set to "*" then the extension associated to the format of self is returned.

Parameters:
ext_or_fmtstr | cdxcore.subdir.Format | None, default None

An extension or a format.

Returns:
extstr

The extension with leading '.'.

auto_ext_fmt(*, ext=None, fmt=None)[source]#

Computes the effective extension and format based on inputs ext and fmt, each of which defaults to the respective values of self.

Resolves an ext of "*" into the extension associated with fmt.

Returns:
(ext, fmt)tuple

Here ext contains the leading '.' and fmt is of type cdxcore.subdir.Format.

cache(version=None, *, dependencies=None, label=None, uid=None, name=None, exclude_args=None, include_args=None, exclude_arg_types=None, version_auto_class=True)[source]#

Advanced versioned caching for callables.

Versioned caching is based on the following two simple principles:

  1. Unique Call IDs:

    When a function is called with some parameters, the wrapper identifies a unique ID based on the qualified name of the function and on its runtime functional parameters (ie those which alter the outcome of the function). When a function is called the first time with a given unique call ID, it will store the result of the call to disk. If the function is called with the same call ID again, the result is read from disk and returned.

    To compute unique call IDs cdxcore.uniquehash.NamedUniqueHash is used by default.

  2. Code Version:

    Each function has a version, which includes dependencies on other functions or classes. If the version of data on disk does not match the current version, it is deleted and the generating function is called again. This way you can use your code to drive updates to data generated with cached functions.

    Behind the scenes this is implemented using cdxcore.version.version() which means that the version of a cached function can also depend on versions of non-cached functions or other objects.

Caching Functions#

Caching a simple function f is staight forward:

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache("0.1")
def f(x,y):
    return x*y

_ = f(1,2)    # function gets computed and the result cached
_ = f(1,2)    # restore result from cache
_ = f(2,2)    # different parameters: compute and store result

Cache another function g which calls f, and whose version therefore on f’s version:

@cache.cache("0.1", dependencies=[f])
def g(x,y):
    return g(x,y)**2

Debugging

When using automated caching it is important to understand how changes in parameters and the version of the a function affect caching. To this end, cdxcore.subdir.SubDir.cache() supports a tracing mechanism via the use of a cdxcore.subdir.CacheController:

from cdxcore.subdir import SubDir, CacheController, Context

ctrl    = CacheController( debug_verbose=Context("all") )
cache   = SubDir("!/.cache", cache_controller=ctrl )
cache.delete_all_content()   # <- delete previous cached files, for this example only

@cache.cache("0.1")
def f(x,y):
    return x*y

_ = f(1,2)    # function gets computed and the result cached
_ = f(1,2)    # restore result from cache
_ = f(2,2)    # different parameters: compute and store result

Returns:

00: cache(f@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'.
00: cache(f@__main__): read 'f@__main__' version 'version 0.1' from cache 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'.
00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ b5609542d7da0b04.pck'.

Non-Functional Parameters

A function may have non-functional parameters which do not alter the function’s outcome. An example are debug flags:

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")

@cache.cache("0.1", dependencies=[f], exclude_args='debug')
def g(x,y,debug): # <--' 'debug' is a non-functional parameter
    if debug:
        print(f"h(x={x},y={y})")  
    return g(x,y)**2

You can define certain types as non-functional for all functions wrapped by cdxcore.subdir.SubDir.cache() when construcing the cdccore.cache.CacheController parameter for in cdxcore.subdir.SubDir:

from cdxcore.subdir import SubDir

class Debugger:
    def output( cond, message ):
        print(message)

ctrl    = CacheController(exclude_arg_types=[Debugger])   # <- exclude 'Debugger' parameters from hasing
cache   = SubDir("!/.cache")

@cache.cache("0.1", dependencies=[f], exclude_args='debug')
def g(x,y,debugger : Debugger): # <-- 'debugger' is a non-functional parameter
    debugger.output(f"h(x={x},y={y})")  
    return g(x,y)**2

Unique IDs and File Naming

The unique call ID of a decorated functions is by default generated by its fully qualified name and a unique hash of its functional parameters.

Key default behaviours of cdxcore.uniquehash.NamedUniqueHash:

  • The NamedUniqueHash hashes objects via their __dict__ or __slot__ members. This can be overwritten for a class by implementing __unique_hash__; see cdxcore.uniquehash.NamedUniqueHash.

  • Function members of objects or any members starting with ‘_’ are not hashed unless this behaviour is changed using cdxcore.subdir.CacheController.

  • Numpy and panda frames are hashed using their byte representation. That is slow and not recommended. It is better to identify numpy/panda inputs via their generating characteristic ID.

Either way, hashes are not particularly human readable. It is often useful to have unique IDs and therefore filenames which carry some context information.

This can be achieved by using label:

from cdxcore.subdir import SubDir, CacheController
ctrl    = CacheController( debug_verbose=Context("all") )
cache   = SubDir("!/.cache", cache_controller=ctrl )
cache.delete_all_content()   # for illustration

@cache.cache("0.1")                     # <- no ID 
def f1(x,y):
    return x*y

@cache.cache("0.1", label="f2({x},{y})") # <- label uses a string to be passed to str.format()
def f2(x,y):
    return x*y

We can also use a function to generate a label. In that case all parameters to the function including its name are passed to the function. In below example we eat any parameters we are not interested in with ** _:

@cache.cache("0.1", label=lambda x,y,**_: f"h({x},{y})", exclude_args='debug') 
def h(x,y,debug=False):
    if debug:
        print(f"h(x={x},y={y})")  
    return x*y

We obtain:

f1(1,1)
f2(1,1)
h(1,1)        

00: cache(f1@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(h@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f1@__main__): called 'f1@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f1@__main__ ef197d80d6a0bbb0.pck'.
00: cache(f2@__main__): called 'f2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f2(1,1) bdc3cd99157c10f7.pck'.
00: cache(h@__main__): called 'h(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h(1,1) d3fdafc9182070f4.pck'.            

Note that the file names f2(1,1) bdc3cd99157c10f7.pck and h(1,1) d3fdafc9182070f4.pck for the f2 and h function calls are now easier to read as they are comprised of the label of the function and a terminal hash key. The trailing hash is appended because we do not assume that the label returned by label is unique. Therefore, a hash generated from all the label itself and all pertinent arguments will be appended to the filename.

If we know how to generate truly unique IDs which are always valid filenames, then we can use uid instead of label:

@cache.cache("0.1", uid=lambda x,y,**_: f"h2({x},{y})", exclude_args='debug') 
def h2(x,y,debug=False):
    if debug:
        print(f"h(x={x},y={y})")  
    return x*y
h2(1,1)

yields:

00: cache(h2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(h2@__main__): called 'h2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h2(1,1).pck'.            

In particular, the filename is now h2(1,1).pck without any hash. If uid is used the parameter of the function are not hashed. Like label the parameter uid can also be a str.format() string or a callable.

Controlliong which Parameters to Hash

To specify which parameters are pertinent for identifying a unique id, use:

  • include_args: list of functions arguments to include. If None, use all parameteres as input in the next step

  • exclude_args: list of function arguments to exclude, if not None.

  • exclude_arg_types: a list of types to exclude. This is helpful if control flow is managed with dedicated data types. An example of such a type is cdxcore.verbose.Context which is used to print hierarchical output messages. Types can be globally excluded using a cdccore.cache.CacheController when calling cdxcore.subdir.SubDir.

Numpy/Pandas

Numpy/Panda data should not be hashed for identifying unique call IDs. Instead, use the defining characteristics for generating the data frames.

For example:

from cdxcore.pretty import PrettyObject
from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache("0.1")
def load_src( src_def ):
    result = ... load ...
    return result

# ignore 'src_result'. It is uniquely identified by 'src_def' -->
@cache.cache("0.1", dependencies=[load_src], exclude_args=['data'])  
def statistics( stats_def, src_def, data ):
    stats = ... using data
    return stats

src_def = PrettyObject()
src_def.start = "2010-01-01"
src_def.end = "2025-01-01"
src_def.x = 0.1

stats_def = PrettyObject()
stats_def.lambda = 0.1
stats_def.window = 100

data  = load_src( src_def )
stats = statistics( stats_def, src_def, data )

While instructive, this case is not optimal: we do not really need to load data if we can reconstruct stats from data (unless we need data further on).

Consider therefore:

@cache.cache("0.1")
def load_src( src_def ):
    result = ... load ...
    return result

# ignore 'src_result'. It is uniquely identified by 'src_def' -->
@cache.cache("0.1", dependencies=[load_src])  
def statistics_only( stats_def, src_def ):
    data  = load_src( src_def )    # <-- embedd call to load_src() here
    stats = ... using src_result
    return stats

stats = statistics_only( stats_def, src_def )

Caching Member Functions#

You can cache member functions like any other function. Note that cdxcore.version.version() information are by default inherited, i.e. member functions will be dependent on the version of their defining class, and class versions will be dependent on their base classes’ versions:

from cdxcore.subdir import SubDir, version
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@version("0.1")
class A(object):
    def __init__(self, x):
        self.x = x

    @cache.cache("0.1")
    def f(self, y):
        return self.x*y

a = A(x=1)
_ = a.f(y=1)   # compute f and store result
_ = a.f(y=1)   # load result back from disk
a.x = 2
_ = a.f(y=1)   # 'a' changed: compute f and store result
b = A(x=2)
_ = b.f(y=1)   # same unique call ID as previous call -> restore result from disk

WARNING cdxcore.uniquehash.UniqueHash does not by default process members of objects or dictionaries which start with a “_”. This behaviour can be changed using cdxcore.subdir.CacheController. For reasonably complex objects it is recommended to implement for your objects the a custom hashing function:

__unique_hash__( self, uniqueHash : UniqueHash, debug_trace : DebugTrace  )

This function is described at cdxcore.uniquehash.UniqueHash.

Caching Bound Member Functions#

Caching bound member functions is technically quite different to caching a function of a class in general, but also supported:

from cdxcore.subdir import SubDir, version
cache   = SubDir("!/.cache", cache_controller : CacheController(debug_verbose=Context("all")))
cache.delete_all_content()   # for illustration

class A(object):
    def __init__(self,x):
        self.x = x
    def f(self,y):
        return self.x*y

a = A(x=1)
f = cache.cache("0.1", id=lambda self, y : f"a.f({y})")(a.f)  # <- decorate bound 'f'.
r = c(y=2)

In this case the function f is bound to a. The object is added as self to the function parameter list even though the bound function parameter list does not include self. This, together with the comments on hashing objects above, ensures that (hashed) changes to a will be reflected in the unique call ID for the member function.

Caching Classes#

Classes can also be cached. In this case the creation of a class is cached, i.e. a call to the class constructor restores the respectiv object from disk.

This is done in two steps:

  1. first, the class itself is decorated using cdxcore.subdir.SubDir.cache() to provide version information at class level. Only version information are provided here.

    You can use cdxcore.subdir.SubDir.cache_class() as an alias.

  2. Secondly, decorate __init__. You do not need to specify a version for __init__ as its version usually coincides with the version of the class. At __init__ you define how unique IDs are generated from the parameters passed to object construction.

    You can use cdxcore.subdir.SubDir.cache_init() as an alias.

Simple example:

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache_class("0.1")
class A(object):

    @cache.cache_init(exclude_args=['debug'])
    def __init__(self, x, debug):
        if debug:
            print("__init__",x)
        self.x = x

a = A(1)    # caches 'a'
b = A(1)    # reads the cached object into 'b'

Technical Comments

The function __init__ does not actually return a value; for this reason behind the scenes it is actually __new__ which is being decorated. Attempting to cache-decorate __new__ manually will lead to an exception.

A nuance for __init__ vs ordinary member function is that the self parameter is non-functional (in the sense that it is an empty object when __init__ is called). self is therefore automatically excluded from computing a unique call ID. That also means self is not part of the arguments passed to uid:

@cache.cache_class("0.1")
class A(object):

    @cache.cache_init(id=lambda x, debug: f"A.__init__(x={x})")  # <-- 'self' is not passed to the lambda function; no need to add **_
    def __init__(self, x, debug):
        if debug:
            print("__init__",x)
        self.x = x

Decorating classes with __slots__ does not yet work.

See also#

For project-wide use it is usually convenient to control caching at the level of a project-wide cache root directory. The classs cdxcore.subdir.VersionedCacheRoot is a thin convenience wrapper around a cdxcore.subdir.SubDir with a cdxcore.subdir.CacheController.

The idea is to have a central file, cache.py which contains the central root for caching. We recommend using an environment variable to be able to control the location of this directory out side the code. Here is an example with an environment variable PROJECT_CACHE_DIR:

# file cache.py

from cdxcore.subdir import VersionedCacheRoot
import os as os

cache_root = VersionedCacheRoot(
                   os.getenv("PROJECT_CACHE_DIR", "!/.cache")
                   )

In a particular project file, say pipeline.py create a file-local cache directory and use it:

# file pipeline.py

from cache import cache_root

cache_dir = cache_root("pipeline")

@cache_dir.cache("0.1")
def f(x):
    return x+2

@cache_dir.cache("0.1", dependencies=[f])
def g(x)
    return f(x)**2

# ...

In case you have issues with caching you can use the central root directory to turn on tracing:

 from cdxcore.verbose import Context
 cache_root = VersionedCacheRoot(
                    os.getenv("PROJECT_CACHE_DIR", "!/.cache"),
                    debug_verbose=Context.all    # turn full traing on
                 )
Parameters:
versionstr | None, default None

Version of the function.

dependencieslist[type] | None, default None

A list of version dependencies, either by reference or by name. See cdxcore.version.version() for details on name lookup if strings are used.

labelstr | Callable | None, default None

Specify a human-readabl label for the function call given its parameters. This label is used to generate the cache file name, and is also printed in when tracing hashing operations. Labels are not assumed to be unique, hence a unique hash of the label and the parameters to this function will be appended to generate the actual cache file name.

Use uid instead if label represents valid unique filenames. You cannot specify both uid and label. If neither uid and label are present, name will be used.

Usage:

  • If label is a plain string without {} formatting: use this string as-is.

  • If label is a string with {} formatting, then label.format( name=name, **parameters ) will be used to generate the actual label.

  • If label is a Callable then label( name=name, **parameters ) will be called to generate the actual label.

See above for examples.

label cannot be used alongside uid.

uidstr | Callable | None, default None

Alternative to label which is assumed to generate a unique cache file name. It has the same semantics as label. When used, parameters to the decorated function are not hashed as the uid is assumed to be already unique. The string must be a valid file name

Use label if the id is not unique. You cannot specify both uid and label. If neither uid and label are present, name will be used (as non-unique label).

namestr | None, default None

Name of this function which is used either on its own if neither label not uid are used, or which passed as a parameter name to either the callable or the formatting operator. See above for more details.

If name is not specified it defaults to __qualname__ expanded by the module name the function is defined in.

include_argslist[str] | None, default None

List of arguments to include in generating an unqiue ID, or None for all.

exclude_argslist[str] | None, default None

List of arguments to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements.

exclude_arg_typeslist[type] | None, default None

List of parameter types to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements.

version_auto_classbool, default True

Whether to automaticallty add version dependencies on base classes or, for member functions, on containing classes. This is the auto_class parameter for cdxcore.version.version().

Returns:
Decorated F: Callable

A decorator cache(F) whose __call__ implements the cached call to F.

This callable has a member cache_info of type cdxcore.subdir.CacheInfo which can be used to access information on caching activity.

  • Information available at any time after decoration:**

    • F.cache_info.name : qualified name of the function

    • F.cache_info.signature : signature of the function

  • Additonal information available during a call to a decorated function F, and thereafter:

    • F.cache_info.version : unique version string reflecting all dependencies.

    • F.cache_info.filename : unique filename used for caching logic during the last function call.

    • F.cache_info.label : last label generated, or None.

    • F.cache_info.arguments : arguments parsed to create a unique call ID, or None.

  • Additonal information available after a call to F:

    • F.cache_info.last_cached : whether the last function call returned a cached object.

The decorated F() has additional function parameters, namely:

  • override_cache_mode : CacheMode | None, default None

    Allows overriding the CacheMode temporarily, in particular you can set it to "off".

  • track_cached_files : cdxcore.subdir.CacheTracker | None, default None

    Allows passing a cdxcore.subdir.CacheTracker object to keep track of all files used (loaded from or saved to). The function cdxcore.subdir.CacheTracker.delete_cache_files() can be used to delete all files involved in caching.

  • return_cache_uid : bool, default False

    If True, then the decorated function will return a tuple uid, result where uid is the unique filename generated for this function call, and where result is the actual result from the function, cached or not.

    Usage:

    from cdxcore.subdir import SubDir
    cache_dir = SubDir("!/.cache")
    
    @cache_dir.cache()
    def f(x, y):
        return x*y
    
    uid, xy = f( x=1, y=2, return_cache_uid=True )
    

    This pattern is thread-safe when compared to using:

    xy = f( x=1, y=2 )
    uid = f.cache_info.filename
    
cache_class(version=None, *, name=None, dependencies=None, version_auto_class=True)[source]#

Short-cut for cdxcore.subdir.SubDir.cache() applied to classes with a reduced number of available parameters.

Example:

cache   = SubDir("!/.cache")

@cache.cache_class("0.1")
class A(object):

    @cache.cache_init(exclude_args=['debug'])
    def __init__(self, x, debug):
        if debug:
            print("__init__",x)
        self.x = x
property cache_controller#

Returns an assigned cdxcore.subdir.CacheController, or None

cache_init(label=None, uid=None, exclude_args=None, include_args=None, exclude_arg_types=None)[source]#

Short-cut for cdxcore.subdir.SubDir.cache() applied to decorating __init__ with a reduced number of available parameters.

Example:

cache   = SubDir("!/.cache")

@cache.cache_class("0.1")
class A(object):

    @cache.cache_init(exclude_args=['debug'])
    def __init__(self, x, debug):
        if debug:
            print("__init__",x)
        self.x = x
property cache_mode#

Returns the cdxcore.subdir.CacheMode associated with the underlying cache controller

create_directory()[source]#

Creates the current directory if it doesn’t exist yet.

delete(file, raise_on_error=False, *, ext=None)[source]#

Deletes file.

This function will quietly fail if file does not exist unless raise_on_error is set to True.

Parameters:
file

filename, or list of filenames

raise_on_errorbool, default False

If False, do not throw KeyError if file does not exist or another error occurs.

extstr | None, default None

Extension, or list thereof if file is a list.

Use

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

delete_all_content(delete_self=False, raise_on_error=False, *, ext=None)[source]#

Deletes all valid keys and subdirectories in this sub directory.

Does not delete files with other extensions. Use cdxcore.subdir.SubDir.delete_everything() if the aim is to delete, well, everything.

Parameters:
delete_self: bool

Whether to delete the directory itself as well, or only its contents.

raise_on_error: bool

False for silent failure

extstr | None, default None

Extension for keys, or None for the directory’s default. Use "" to match all files regardless of extension.

delete_all_files(raise_on_error=False, *, ext=None)[source]#

Deletes all valid keys in this sub directory with the correct extension.

Parameters:
raise_on_errorbool

Set to False to quietly ignore errors.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

delete_everything(keep_directory=True)[source]#

Deletes the entire sub directory will all contents.

WARNING: deletes all files and sub-directories, not just those with the present extension. If keep_directory is False, the directory referred to by this object will also be deleted. In this case, self will be set to None.

property existing_path: str#

Return current path, including training '/'.

existing_path ensures that the directory structure exists (or raises an exception). Use cdxcore.subdir.SubDir.path() if creation on the fly is not desired.

exists(file, *, ext=None)[source]#

Checks whether a file exists.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Statusbool

If file is a string, returns True or False, else it will return a list of bool values.

static expand_std_root(name)[source]#

Expands name by a standardized root directory if provided:

The first character of name can be either of:

If neither of these matches the first character, name is returned as is.

This function does not support "?".

property ext: str#

Returns the common extension of the files in this directory, including leading '.'. Resolves "*" into the extension associated with the current cdxcore.subdir.Format.

file_size(file, *, ext=None)[source]#

Returns the file size of a file.

See comments on os.path.getatime() for system compatibility information.

Parameters:
filestr

Filename, or list of filenames.

extstr

Extension, or list thereof if file is an extension.

  • Use None for the directory default.

  • Use "" for no automatic extension.

Returns:
File size if file, or None if an error occured.
files(*, ext=None)[source]#

Returns a list of files in this subdirectory with the current extension, or the specified extension.

In other words, if the extension is “.pck”, and the files are “file1.pck”, “file2.pck”, “file3.bin” then this function will return [ “file1”, “file2” ]

If ext is:

  • None, then the directory’s default extension will be used.

  • "" then this function will return all files in this directory.

  • "*" then the extension corresponding to the current format will be used.

This function ignores directories. Use cdxcore.subdir.SubDir.sub_dirs() to retrieve those.

property fmt: Format#

Returns current cdxcore.subdir.Format.

full_file_name(file, *, ext=None)[source]#

Returns fully qualified file name.

The function tests that file does not contain directory information.

Parameters:
filestr

Core file name without path or extension.

extstr | None, default None

If not None, use this extension rather than cdxcore.subdir.SubDir.ext.

Returns:
Filenamestr | None

Fully qualified system file name. If self is None, then this function returns None; if file is None then this function also returns None.

full_temp_file_name(file=None, *, ext=None, create_directory=False)[source]#

Returns a fully qualified unique temporary file name with path and extension

The file name is generated by applying a unique hash to the current directory, file, the current process and thread IDs, and datetime.datetime.now().

If file is not None it will be used as a label.

This function returns the fully qualified file name. Use cdxcore.subdir.SubDir.temp_file_name() to only a file name.

Parameters:
filestr | None, default None

An optional file. If provided, cdxcore.uniquehash.named_unique_filename48_8() is used to generate the temporary file which means that a portion of file will head the returned temporary name.

If file is None, cdxcore.uniquehash.unique_hash48() is used to generate a 48 character hash.

extstr | None, default None

Extension to use, or None for the extrension of self.

Returns:
Temporary file namestr

The fully qualified file name.

get_creation_time(file, *, ext=None)[source]#

Returns the creation time of a file.

See comments on os.path.getctime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occured.

get_last_access_time(file, *, ext=None)[source]#

Returns the last access time of a file.

See comments on os.path.getatime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occured.

get_last_modification_time(file, *, ext=None)[source]#

Returns the last modification time a file.

See comments on os.path.getmtime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occured.

get_version(file, raise_on_error=False, *, ext=None, fmt=None)[source]#

Returns a version stored in a file.

This requires that the file has previously been saved with a version. Otherwise this function will have unpredictable results.

Parameters:
filestr

A filename, or a list thereof.

raise_on_errorbool

Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Set to:

  • None to use directory’s default.

  • "*" to use the extension implied by fmt.

  • "" for no extension.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is.

Returns:
versionstr

The version.

property is_none: bool#

Whether this object is None or not. For such SubDir object no files exists, and writing any file will fail.

is_version(file, version=None, raise_on_error=False, *, ext=None, fmt=None, delete_wrong_version=True)[source]#

Tests the version of a file.

Parameters:
filestr

A filename, or a list thereof.

versionstr

Specifies the version to compare the file’s version with.

You can use "*" to match any version.

raise_on_errorbool

Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Set to:

  • None to use directory’s default.

  • "*" to use the extension implied by fmt.

  • "" for no extension.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is.

Returns:
Statusbool

Returns True only if the file exists, has version information, and its version is equal to version.

items(*, ext=None, raise_on_error=False)[source]#

Dictionary-style iterable of filenames and their content.

Usage:

subdir = SubDir("!")
for file, data in subdir.items():
    print( file, str(data)[:100] )
Parameters:
extstr | None, default None

Extension or None for the directory’s current extension. Use "" for all file extension.

Returns:
Iterable

An iterable generator

property path: str#

Return current path, including trailing '/'.

Note that the path may not exist yet. If existence is required, consider using cdxcore.subdir.SubDir.existing_path().

path_exists()[source]#

Whether the current directory exists

read(file, default=None, raise_on_error=False, *, version=None, delete_wrong_version=True, ext=None, fmt=None)[source]#

Read data from a file if the file exists, or return default.

  • Supports file containing directory information.

  • Supports file (and default``as well as ``ext) being iterable. Examples:

    from cdxcore.subdir import SubDir
    files = ['file1', 'file2']
    sd = SubDir("!/test")
    
    sd.read( files )          # both files are using default None
    sd.read( files, 1 )       # both files are using default '1'
    sd.read( files, [1,2] )   # files use defaults 1 and 2, respectively
    
    sd.read( files, [1] )      # produces error as len(keys) != len([1])
    

    Strings are iterable but are treated as single value. Therefore:

    sd.read( files, '12' )      # the default value '12' is used for both files
    sd.read( files, ['1','2'] ) # use defaults '1' and '2', respectively
    
Parameters:
filestr

A file name or a list thereof. file may contain subdirectories.

default

Default value, or default values if file is a list.

raise_on_errorbool, default False

Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns the default.

versionstr | None, default None

If not None, specifies the version of the current code base.

In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a cdxcore.version.VersionError exception).

You can specify version "*" to accept any version. Note that this is distinct to using None which stipulates that the file should not have version information.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete the file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Use:

  • None to use directory’s default.

  • '*' to use the extension implied by fmt.

  • "" to turn of extension management.

fmtcdxcore.subdir.Format | None, default None

File cdxcore.subdir.Format or None to use the directory’s default.

Note:

  • fmt cannot be a list even if file is.

  • Unless ext or the SubDir’s extension is '*', changing the format does not automatically change the extension.

Returns:
Contenttype | list

For a single file returns the content of the file if successfully read, or default otherwise. If file` is a list, this function returns a list of contents.

Raises:
Version errorcdxcore.version.VersionError:

If the file’s version did not match the version provided.

Version presentcdxcore.subdir.VersionPresentError:

When attempting to read a file without version which has a version this exception is raised.

I/O errorsException

Various standard I/O errors are raisedas usual.

read_string(file, default=None, raise_on_error=False, *, ext=None)[source]#

Reads text from a file. Removes trailing EOLs.

Returns the read string, or a list of strings if file was iterable.

static remove_bad_file_characters(file, by='default')[source]#

Replaces invalid characters in a filename using the map by.

See cdxcore.util.fmt_filename() for documentation and further options.

rename(source, target, *, ext=None)[source]#

Rename a file.

This function will raise an exception if not successful.

Parameters:
source, targetstr

Filenames.

extstr

Extension.

  • Use None for the directory default.

  • Use "" for no automatic extension.

sub_dirs()[source]#

Retrieve a list of all sub directories.

If self does not refer to an existing directory, then this function returns an empty list.

static temp_dir()[source]#

Return system temp directory. Short-cut to tempfile.gettempdir(). Result contains trailing '/'.

temp_file_name(file=None)[source]#

Returns a unique temporary file name.

The file name is generated by applying a unique hash to the current directory, file, the current process and thread IDs, and datetime.datetime.now().

If file is not None it will be used as a label.

This function returns just the file name. Use cdxcore.subdir.SubDir.full_temp_file_name() to get a full temporary file name including path and extension.

Parameters:
filestr | None, default None

An optional file. If provided, cdxcore.uniquehash.named_unique_filename48_8() is used to generate the temporary file which means that a portion of file will head the returned temporary name.

If file is None, cdxcore.uniquehash.unique_hash48() is used to generate a 48 character hash.

Returns:
Temporary file namestr

The file name.

static temp_temp_dir()[source]#

Return a temporary temp directory name using tempfile.mkdtemp(). Noet that this function will return a different directory upon every function call.

It is strongly recommended to clean up after usage, for example using the pattern:

from cdxcore.subdir import SubDir
import shutil

try:
    tmp_dir = SubDir.temp_temp_dir()

    ...
finally:
    shutil.rmtree(tmp_dir)

Result contains trailing '/'.

static user_dir()[source]#

Return current working directory. Short-cut for os.path.expanduser() with parameter ' '. Result contains trailing '/'.

static working_dir()[source]#

Return current working directory. Short-cut for os.getcwd(). Result contains trailing '/'.

write(file, obj, raise_on_error=True, *, version=None, ext=None, fmt=None)[source]#

Writes an object to file.

  • Supports file containing directories.

  • Supports file being a list. In this case, if obj is an iterable it is considered the list of values for the elements of file. If obj is not iterable, it will be written into all files from file:

    from cdxcore.subdir import SubDir
    
    keys = ['file1', 'file2']
    sd = SubDir("!/test")
    sd.write( keys, 1 )               # works, writes '1' in both files.
    sd.write( keys, [1,2] )           # works, writes 1 and 2, respectively
    sd.write( keys, "12" )            # works, writes '12' in both files
    sd.write( keys, [1] )             # produces error as len(keys) != len(obj)
    

If the current directory is None, then the function raises an EOFError exception.

Parameters:
filestr

Core filename, or list thereof.

obj

Object to write, or list thereof if file is a list.

raise_on_errorbool

If False, this function will return False upon failure.

versionstr | None, default None

If not None, specifies the version of the code which generated obj. This version will be written to the beginning of the file.

extstr | None, default None

Extension, or list thereof if file is a list.

  • Use None to use directory’s default extension.

  • Use "*" to use the extension implied by fmt.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is. Note that unless ext or the SubDir’s extension is ‘*’, changing the format does not automatically change the extension used.

Returns:
Successbool

Boolean to indicate success if raise_on_error is False.

write_string(file, line, raise_on_error=True, *, ext=None)[source]#

Writes a line of text into a file.

  • Supports file` containing directories.

  • Supports file` being a list. In this case, line can either be the same value for all file’s or a list, too.

If the current directory is None, then the function throws an EOFError exception

exception cdxcore.subdir.VersionPresentError[source]#

Bases: RuntimeError

Exception raised in case a file was read which had a version, but no test version was provided.

cdxcore.subdir.VersionedCacheRoot(directory, *, ext=None, fmt=None, create_directory=False, **controller_kwargs)[source]#

Create a root directory for versioned caching on disk using cdxcore.subdir.SubDir.cache().

Usage:

In a central file, define a root directory for all caching activity:

from cdxcore.subdir import VersionedCacheRoot
vroot = VersionedCacheRoot("!/cache")

Create sub-directories as suitable, for example:

vtest = vroot("test")

Use these for caching:

@vtest.cache("1.0")
def f1( x=1, y=2 ):
    print(x,y)

@vtest.cache("1.0", dps=[f1])
def f2( x=1, y=2, z=3 ):
    f1( x,y )
    print(z)
Parameters:
directorystr

Name of the root directory for caching.

Using SubDir the following Short-cuts are supported:

  • "!/dir" creates dir in the temporary directory.

  • "~/dir" creates dir in the home directory.

  • "./dir" creates dir relative to the current directory.

extstr | None, default None

Extension, which will automatically be appended to file names. The default value depends on fmt`; for ``Format.PICKLE it is “pck”.

fmtcdxcore.subdir.Format | None, default None

File format; if ext is not specified, the format drives the extension, too. The default None becomes Format.PICKLE.

create_directorybool, default False

Whether to create the directory upon creation.

controller_kwargs: dict

Parameters passed to cdxcore.subdir.CacheController`.

Common parameters used:

  • exclude_arg_types: list of types or names of types to exclude when auto-generating function signatures from function arguments. An example is cdxcore.verbose.Context which is used to print progress messages.

  • max_filename_length: maximum filename length.

  • hash_length: length used for hashes, see cdxcore.uniquehash.UniqueHash.

  • debug_verbose set to Context.all after importing from cdxcore.verbose import Context will turn on tracing all caching operations.

Returns:
Rootcdxcore.subdir.SubDir

A root directory suitable for caching.