cdxcore.subdir#

Functions

VersionedCacheRoot(directory, *[, ext, fmt, ...])

Create a root directory for versioned caching on disk using cdxcore.subdir.SubDir.cache().

Classes

CacheController(*[, exclude_arg_types, ...])

Central control parameters for caching.

CacheInfo(name, idversion, keep_last_arguments)

Information on cfunctions decorated with cdxcore.subdir.SubDir.cache().

CacheMode([mode])

A class which encodes standard behaviour of a caching strategy.

CacheTracker()

Utility class to track caching and be able to delete all dependent objects.

Format(*values)

General purpose file formats for cdxcore.subdir.SubDir.

SubDir(name[, parent, ext, fmt, ...])

SubDir implements a transparent i/o interface for storing data in files.

Exceptions

VersionPresentError

Exception raised in case a file was read which had a version, but no test version was provided.

class cdxcore.subdir.CacheController(*, exclude_arg_types=[<class 'cdxcore.verbose.Context'>], cache_mode='on', max_filename_length=48, hash_length=8, debug_verbose=None, keep_last_arguments=False)[source]#

Bases: object

Central control parameters for caching.

When a parameter object of this type is assigned to a cdxcore.subdir.SubDir, then it is passed on when sub-directories are created. This way all sub directories have the same caching behaviour.

Parameters:
exclude_arg_typeslist[type | str], optional

List of types or names of types to exclude from producing unique ids from function arguments. Strings are compated to type(arg).__name__.

Defaults to [Context].

cache_modeCacheMode, default ON

Top level cache control. Set to “OFF” to turn off all caching.

max_filename_lengthint, default 48

Maximum filename length. If unique id’s exceed the file name a hash of length hash_length will be intergated into the file name. See cdxcore.uniquehash.NamedUniqueHash.

hash_lengthint, default 8

Length of the hash used to make sure each filename is unique See cdxcore.uniquehash.NamedUniqueHash.

debug_verbosecdxcore.verbose.Context | None, default None

If not None print caching process messages to this object.

keep_last_argumentsbool, default False

Keep a dictionary of all parameters as string representations after each function call. If the function F was decorated using :meth:cdxcore.subdir.SubDir.cache, you can access this information via F.cache_info.last_arguments.

Note that strings are limited to 100 characters per argument to avoid memory overload when large objects are passed.

class cdxcore.subdir.CacheInfo(name, idversion, keep_last_arguments)[source]#

Bases: PrettyObject

Information on cfunctions decorated with cdxcore.subdir.SubDir.cache().

Functions decorated with cdxcore.subdir.SubDir.cache() will have a member cache_info of this type

filename#

Unique filename of the last function call.

label#

Label of the last function call.

last_arguments#

Last arguments used. This member is only present if keep_last_arguments was set to True for the relevant cdxcore.subdir.CacheController.

last_cached#

Whether the last function call restored data from disk.

name#

Decoded name of the function.

path#

Fully qualified path where the file was stored.

unique_id#

Unique ID of the last function call.

version#

(hash) version used. This is equal to F.version.unique_id64.

class cdxcore.subdir.CacheMode(mode=None)[source]#

Bases: object

A class which encodes standard behaviour of a caching strategy.

Summary mechanics:

Action

on

gen

off

update

clear

readonly

load cache from disk if exists

x

x

x

write updates to disk

x

x

x

delete existing object

x

delete existing object if incompatible

x

x

x

Standard Caching Semantics

Assuming we wish to cache results from calling a function f in a file named filename in a directory directory, then this is the CacheMode waterfall:

def cache_f( filename : str, directory : SubDir, version : str, cache_mode : CacheMode ):
    if cache_mode.delete:
        directory.delete(filename)
    if cache_mode.read:
        r = directory.read(filename,
                           default=None,  
                           version=version,
                           raise_on_error=False,
                           delete_wrong_version=cache_mode.del_incomp
                           )
        if not r is None:
            return r

    r = f(...) # compute result

    if cache_mode.write:
        directory.write(filename,
                        r,
                        version=version,
                        raise_on_error=False
                        )

    return r

See cdxcore.subdir.SubDir.cache() for a comprehensive implementation.

Parameters:
modestr, optional

Which mode to use: "on", "gen", "off", "update", "clear" or "readonly".

The default is None in which case "on" is used.

CLEAR = 'clear'#
GEN = 'gen'#
HELP = "'on' for standard caching; 'gen' for caching but keep existing incompatible files; 'off' to turn off; 'update' to overwrite any existing cache; 'clear' to clear existing caches; 'readonly' to read existing caches but not write new ones"#

Standard config help text, to be used with cdxcore.config.Config.__call__() as follows:

from cdxcore.config import Config
from cdxcore.subdir import CacheMode

def get_cache_mode( config : Config ) -> CacheMode:
    return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
MODES = ['on', 'gen', 'off', 'update', 'clear', 'readonly']#

List of available modes in text form. This list can be used as cast parameter when calling cdxcore.config.Config.__call__():

from cdxcore.config import Config
from cdxcore.subdir import CacheMode

def get_cache_mode( config : Config ) -> CacheMode:
    return CacheMode( config("cache_mode", "on", CacheMode.MODES, CacheMode.HELP) )
OFF = 'off'#
ON = 'on'#
READONLY = 'readonly'#
UPDATE = 'update'#
property del_incomp: bool#

Whether to delete existing data if it is not compatible or has the wrong version.

property delete: bool#

Whether to delete existing data.

property is_clear: bool#

Whether this cache mode is CLEAR.

property is_gen: bool#

Whether this cache mode is GEN.

property is_off: bool#

Whether this cache mode is OFF.

property is_on: bool#

Whether this cache mode is ON.

property is_readonly: bool#

Whether this cache mode is READONLY.

property is_update: bool#

Whether this cache mode is UPDATE.

property read: bool#

Whether to load any existing cached data.

property write: bool#

Whether to cache newly computed data to disk.

class cdxcore.subdir.CacheTracker[source]#

Bases: object

Utility class to track caching and be able to delete all dependent objects.

delete_cache_files()[source]#

Delete all tracked files

class cdxcore.subdir.Format(*values)[source]#

Bases: Enum

General purpose file formats for cdxcore.subdir.SubDir.

Format

Restores objects

Human readable

Speed

Compression

Extension

Types

PICKLE

yes

no

high

no

.pck

all

JSON_PLAIN

no

yes

low

no

.json

all

JSON_PICKLE

yes

limited

low

no

.jpck

all

BLOSC

yes

no

high

yes

.zbsc

all

GZIP

yes

no

high

yes

.pgz

all

POLARS_PARQUET

yes

no

high

yes

.pgz

polars

cdxcore.subdir.SubDir supports POLARS_PARQUET for reading and writing polars.DataFrame files to parquet. Version information is stored in meta data. When used, the object passed to cdxcore.subdir.write() must be a polars data frame.

BLOSC = 3#

blosc binary compressed format.

GZIP = 4#

gzip binary compressed format.

JSON_PICKLE = 1#

jsonpickle format.

JSON_PLAIN = 2#

json format.

PICKLE = 0#

Standard binary pickle format.

POLARS_PARQUET = 10#
class cdxcore.subdir.SubDir(name, parent=None, *, ext=None, fmt=None, create_directory=None, cache_controller=None, delete_everything=False, delete_everything_upon_exit=False)[source]#

Bases: object

SubDir implements a transparent i/o interface for storing data in files.

Directories

Instantiate a SubDir with a directory name. There are some pre-defined relative system paths the name can refer to:

from cdxcore.subdir import SubDir
parent  = SubDir("!/subdir")         # relative to system temp directory
parent  = SubDir("~/subdir")         # relative to user home directory
parent  = SubDir("./subdir")         # relative to current working directory (explicit)
parent  = SubDir("subdir")           # relative to current working directory (implicit)
parent  = SubDir("/tmp/subdir")      # absolute path (linux)
parent  = SubDir("C:/temp/subdir")   # absolute path (windows)
parent  = SubDir("")                 # current working directory

Sub-directories can be generated in a number of ways:

subDir = parent('subdir')              # using __call__
subDir = SubDir('subdir', parent)      # explicit constructor
subDir = SubDir('subdir', parent="!/") # explicit constructor with parent being a string

Files managed by SubDir will usually have the same extension. This extension can be specified with ext, or as part of the directory string:

subDir = SubDir("~/subdir", ext="bin") # set extension to 'bin'
subDir = SubDir("~/subdir;*.bin")      # set extension to 'bin'

Leaving the extension as default None allows SubDir to automatically use the extension associated with any specified format.

Copy Constructor

The constructor is shallow.

File I/O

Write data with cdxcore.subdir.SubDir.write():

subDir.write('item3',item3)          # explicit
subDir['item1'] = item1              # dictionary style

Note that cdxcore.subdir.SubDir.write() can write to multiple files at the same time.

Read data with cdxcore.subdir.SubDir.read():

item = subDir('item', 'i1')          # returns 'i1' if not found.
item = subdir.read('item')           # returns None if not found
item = subdir.read('item','i2')      # returns 'i2' if not found
item = subDir['item']                # raises a KeyError if not found

Treat files in a directory like dictionaries:

for file in subDir:
    data = subDir[file]
    f(item, data)

for file, data in subDir.items():
    f(item, data)

Delete items:

del subDir['item']                    # silently fails if 'item' does not exist
subDir.delete('item')                 # silently fails if 'item' does not exist
subDir.delete('item', True)           # raises a KeyError if 'item' does not exit

Cleaning up:

parent.delete_all_content()        # silently deletes all files with matching extensions, and sub directories.

File Format

SubDir supports a number of file formats via cdxcore.subdir.Format. Those can be controlled with the fmt keyword in various functions not least cdxcore.subdir.SubDir:

subdir = SubDir("!/.test", fmt=SubDir.JSON_PICKLE)

See cdxcore.subdir.Format for supported formats.

Polars

A SubDir can read and write polars.DataFrame if the format is set to cdxcore.subdir.Format.POLARS_PARQUET:

import polars as pl
import numpy as np
from cdxcore.subdir import SubDir

x = np.linspace(0,1,5)
y = np.sin(x)
df = pl.DataFrame({"x":pl.Series(x,pl.Float32), "y":pl.Series(y,pl.Float32)})

sub = SubDir("!/polars", fmt=SubDir.POLARS_PARQUET)
sub.write("test", df)
r = sub.read("test", raise_on_error=True)
assert np.all(r == df)

Version handling is supported with parquet files.

Parameters:
namestr:

Name of the directory.

The name may start with any of the following special characters:

  • '.' for current directory.

  • '~' for home directory.

  • '!' for system default temp directory. Note that outside any administator imposed policies, sub directories of ! are permanent.

  • '?' for a temporary temp directory; see cdxcore.subdir.SubDir.temp_temp_dir() regarding semantics.

    Most importantly, every SubDir will be constructed with a different (truly) temporary sub directory. If used, delete_everything_upon_exit is always True.

The directory name may also contain a formatting string for defining ext on the fly: for example use "!/test;*.bin" to specify a directory "test" in the user’s temp directory with extension "bin".

The directory name can be set to None in which case it is always empty and attempts to write to it fail with EOFError.

parentstr | SubDir | None, default None

Parent directory.

If parent is a cdxcore.subdir.SubDir then its parameters are used as default values.

extstr | None, default None

Extension for files managed by this SubDir. All files managed by self will share the same extension.

If set to "" no extension is assigned to this directory. That mean that all files are considered. For example, cdxcore.subdir.SubDir.files() then returns all files contained in the directory, not just files with a specific extension.

If ext is None, then use parent.ext or if parent was provided, or otherwise the extension defined by fmt:

  • ‘pck’ for the default PICKLE format.

  • ‘json’ for JSON_PLAIN.

  • ‘jpck’ for JSON_PICKLE.

  • ‘zbsc’ for BLOSC.

  • ‘pgz’ for GZIP.

  • ‘prq’ for POLARS_PARQUET.

fmtcdxcore.subdir.Format | None, default Format.PICKLE

One of the cdxcore.subdir.Format codes.

If ext is left to None and parent is None then setting the a format will also set the corrsponding ext.

create_directorybool | None, default False

Whether to create the directory upon creation of the SubDir object; otherwise it will be created upon first cdxcore.subdir.SubDir.write().

Set to None to use the setting of the parent directory, or False if no parent is specified.

cache_controllercdxcore.subdir.CacheController | None, default None

An object which fine-tunes the behaviour of cdxcore.subdir.SubDir.cache(). See cdxcore.subdir.CacheController documentation for further details.

delete_everythingbool, default False

Delete all contents in the newly defined sub directory upon creation.

delete_everything_upon_exitbool, default False

Delete all contents of the current exist if self is deleted. This is the always True if the "?/" pretext was used.

Note, however, that this will only be executed once the object is garbage collected.

Default is, for some good reason, is False.

DEFAULT_FORMAT = 0#

Default cdxcore.subdir.Format: Format.PICKLE

class Format(*values)#

Bases: Enum

The same as cdxcore.subdir.Format for convenience

BLOSC = 3#

blosc binary compressed format.

GZIP = 4#

gzip binary compressed format.

JSON_PICKLE = 1#

jsonpickle format.

JSON_PLAIN = 2#

json format.

PICKLE = 0#

Standard binary pickle format.

POLARS_PARQUET = 10#
__call__(element=None, default=<object object>, raise_on_error=False, *, version=None, ext=None, fmt=None, delete_wrong_version=True, create_directory=None)[source]#

Read either data from a file, or return a new sub directory.

If only the element argument is used, then this function returns a new sub directory named element.

If both element and default arguments are used, then this function attempts to read the file element from disk, returning default if it does not exist.

Assume we have a subdirectory sd:

from cdxcore.subdir import SubDir
sd  = SubDir("!/test")

Reading files:

x   = sd('file', None)                   # reads 'file' with default value None
x   = sd('sd/file', default=1)           # reads 'file' from sub directory 'sd' with default value 1
x   = sd('file', default=1, ext="tmp")   # reads 'file.tmp' with default value 1

Create sub directory:

sd2 = sd("subdir")                       # creates and returns handle to subdirectory 'subdir'
sd2 = sd("subdir1/subdir2")              # creates and returns handle to subdirectory 'subdir1/subdir2'
sd2 = sd("subdir1/subdir2", ext=".tmp")  # creates and returns handle to subdirectory 'subdir1/subdir2' with extension "tmp"
sd2 = sd(ext=".tmp")                     # returns handle to current subdirectory with extension "tmp"
Parameters:
elementstr | None

File or directory name, or a list thereof.

element can be None if default is left at its dummy value SubDir.RET_SUB_DIR (the default) in which case __call__ refers to the current directory.

defaultAny, default SubDir.RET_SUB_DIR

If specified, this function reads element with read( element, default, *args, **kwargs ).

If default is not specified and left at the dummy value SubDir.RET_SUB_DIR, then this function returns a new sub-directory by calling SubDir(element,parent=self,ext=ext,fmt=fmt).

create_directorybool, default None

When creating sub-directories:

Whether or not to instantly create the sub-directory. The default, None, is to inherit the behaviour from self.

raise_on_errorbool, default False

When reading files:

Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns default.

versionstr | None, default None

When reading files:

If not None, specifies the version of the current code base.

In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a cdxcore.version.VersionError exception).

You can specify version "*" to accept any version. Note that this is distinct to using None which stipulates that the file should not have version information.

delete_wrong_versionbool, default True.

When reading files:

If True, and if a wrong version was found, delete the file.

extstr | None, default is None.

When reading files:

Extension to be used, or a list thereof if element is a list. Defaults to the extension of self.

Semantics:

  • None to use the default extension of self.

  • "*" to use the extension implied by fmt.

  • "" to turn off extension management.

When creating sub-directories:

Extension for the new subdirectory; set to None to inherit the parent’s extension.

fmtcdxcore.subdir.Format | None, default None

When reading files:

File format or None to use the directory’s default. Note that fmt cannot be a list even if element is. Unless ext or the SubDir’s extension is "*", changing the format does not automatically change the extension.

When creating sub-directories:

Format for the new sub-directory; set to None to inherit the parent’s format.

Returns:
Objecttype | SubDir

Either the value in the file, a new sub directory, or lists thereof.

static as_format(format_name)[source]#

Converts a named format into the respective format code.

Example:

format = SubDir.as_format( config("format", "pickle", SubDir.FORMAT_NAMES, "File format") )    
auto_ext(ext_or_fmt=None)[source]#

Computes the effective extension based on theh inputs ext_or_fmt, and the current settings for self.

If ext_or_fmt is set to "*" then the extension associated to the format of self is returned.

Parameters:
ext_or_fmtstr | cdxcore.subdir.Format | None, default None

An extension or a format.

Returns:
extstr

The extension with leading '.'.

auto_ext_fmt(*, ext=None, fmt=None)[source]#

Computes the effective extension and format based on inputs ext and fmt, each of which defaults to the respective values of self.

Resolves an ext of "*" into the extension associated with fmt.

Returns:
(ext, fmt)tuple

Here ext contains the leading '.' and fmt is of type cdxcore.subdir.Format.

cache(version=None, *, dependencies=None, label=None, uid=None, name=None, in_sub_dir=None, exclude_args=None, include_args=None, exclude_arg_types=None, version_auto_class=True, name_of_func_name_arg='func_name')[source]#

Advanced versioned caching for callables.

Versioned caching is based on the following two simple principles:

  1. Unique Call IDs:

    When a function is called with some parameters, the wrapper identifies a unique ID based on the qualified name of the function and on its runtime functional parameters (ie those which alter the outcome of the function). When a function is called the first time with a given unique call ID, it will store the result of the call to disk. If the function is called with the same call ID again, the result is read from disk and returned.

    To compute unique call IDs cdxcore.uniquehash.NamedUniqueHash is used by default.

  2. Code Version:

    Each function has a version, which includes dependencies on other functions or classes. If the version of data on disk does not match the current version, it is deleted and the generating function is called again. This way you can use your code to drive updates to data generated with cached functions.

    Behind the scenes this is implemented using cdxcore.version.version() which means that the version of a cached function can also depend on versions of non-cached functions or other objects.

Caching Functions#

Caching a simple function f is staight forward:

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache("0.1")
def f(x,y):
    return x*y

_ = f(1,2)    # function gets computed and the result cached
_ = f(1,2)    # restore result from cache
_ = f(2,2)    # different parameters: compute and store result

Cache another function g which calls f, and whose version therefore on f’s version:

@cache.cache("0.1", dependencies=[f])
def g(x,y):
    return g(x,y)**2

Debugging

When using automated caching it is important to understand how changes in parameters and the version of the a function affect caching. To this end, cdxcore.subdir.SubDir.cache() supports a tracing mechanism via the use of a cdxcore.subdir.CacheController:

from cdxcore.subdir import SubDir, CacheController, Context

ctrl    = CacheController( debug_verbose=Context("all") )
cache   = SubDir("!/.cache", cache_controller=ctrl )
cache.delete_all_content()   # <- delete previous cached files, for this example only

@cache.cache("0.1")
def f(x,y):
    return x*y

_ = f(1,2)    # function gets computed and the result cached
_ = f(1,2)    # restore result from cache
_ = f(2,2)    # different parameters: compute and store result

Returns:

00: cache(f@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'.
00: cache(f@__main__): read 'f@__main__' version 'version 0.1' from cache 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ 668a6b111549e288.pck'.
00: cache(f@__main__): called 'f@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f@__main__ b5609542d7da0b04.pck'.

Non-Functional Parameters

A function may have non-functional parameters which do not alter the function’s outcome. An example are debug flags:

from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")

@cache.cache("0.1", dependencies=[f], exclude_args='debug')
def g(x,y,debug): # <--' 'debug' is a non-functional parameter
    if debug:
        print(f"h(x={x},y={y})")  
    return g(x,y)**2

You can define certain types as non-functional for all functions wrapped by cdxcore.subdir.SubDir.cache() when construcing the cdxcore.cache.CacheController parameter for in cdxcore.subdir.SubDir:

from cdxcore.subdir import SubDir

class Debugger:
    def output( cond, message ):
        print(message)

ctrl    = CacheController(exclude_arg_types=[Debugger])   # <- exclude 'Debugger' parameters from hasing
cache   = SubDir("!/.cache")

@cache.cache("0.1", dependencies=[f])
def g(x,y,debugger : Debugger): # <-- 'debugger' is a non-functional parameter
    debugger.output(f"h(x={x},y={y})")  
    return g(x,y)**2

Unique IDs and File Naming

The unique call ID of a decorated function is by logicaly generated by its fully qualified name and a unique hash of its functional parameters.

By default, cdxcore.uniquehash.NamedUniqueHash is used to compute unique hashes. Key default behaviours of cdxcore.uniquehash.NamedUniqueHash:

  • cdxcore.uniquehash.NamedUniqueHash hashes objects via their __dict__ or __slot__ members. This can be overwritten for a class by implementing __unique_hash__; see cdxcore.uniquehash.NamedUniqueHash.

  • Function members of objects or any members starting with ‘_’ are not hashed unless this behaviour is changed using cdxcore.subdir.CacheController.

  • Numpy and panda frames are hashed using their byte representation. That is slow and not recommended. It is better to identify numpy/panda inputs via their generating characteristic ID.

Either way, hashes are not particularly human readable. It is often useful to have unique IDs and therefore filenames which carry some context information.

This can be achieved by using label:

from cdxcore.subdir import SubDir, CacheController
ctrl    = CacheController( debug_verbose=Context("all") )
cache   = SubDir("!/.cache", cache_controller=ctrl )
cache.delete_all_content()   # for illustration

@cache.cache("0.1")                     # <- no ID 
def f1(x,y):
    return x*y

@cache.cache("0.1", label="f2({x},{y})") # <- label uses a string to be passed to str.format()
def f2(x,y):
    return x*y

We can also use a function to generate a label. In that case all parameters to the function including its func_name are passed to the function.:

@cache.cache("0.1", label=lambda x,y: f"h({x},{y})", exclude_args='debug') 
def h(x,y,debug=False):
    if debug:
        print(f"h(x={x},y={y})")  
    return x*y

We obtain:

f1(1,1)
f2(1,1)
h(1,1)        

00: cache(f1@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(h@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(f1@__main__): called 'f1@__main__' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f1@__main__ ef197d80d6a0bbb0.pck'.
00: cache(f2@__main__): called 'f2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/f2(1,1) bdc3cd99157c10f7.pck'.
00: cache(h@__main__): called 'h(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h(1,1) d3fdafc9182070f4.pck'.            

Note that the file names f2(1,1) bdc3cd99157c10f7.pck and h(1,1) d3fdafc9182070f4.pck for the f2 and h function calls are now easier to read as they are comprised of the label of the function and a terminal hash key. The trailing hash is appended because we do not assume that the label returned by label is unique. Therefore, a hash generated from all the label itself and all pertinent arguments will be appended to the filename.

If we know how to generate truly unique IDs which are always valid filenames, then we can use uid instead of label:

@cache.cache("0.1", uid=lambda x,y: f"h2({x},{y})", exclude_args='debug') 
def h2(x,y,debug=False):
    if debug:
        print(f"h(x={x},y={y})")  
    return x*y
h2(1,1)

yields:

00: cache(h2@__main__): function registered for caching into 'C:/Users/hans/AppData/Local/Temp/.cache/'.
00: cache(h2@__main__): called 'h2(1,1)' version 'version 0.1' and wrote result into 'C:/Users/hans/AppData/Local/Temp/.cache/h2(1,1).pck'.            

In particular, the filename is now h2(1,1).pck without any hash. If uid is used the parameter of the function are not hashed. Like label the parameter uid can also be a str.format() string or a callable.

Controlliong which Parameters to Hash

To specify which parameters are pertinent for identifying a unique ID, use:

  • include_args: list of functions arguments to include. If None, use all parameteres as input in the next step

  • exclude_args: list of function arguments to exclude, if not None.

  • exclude_arg_types: a list of types or names of type to exclude. This is helpful if control flow is managed with dedicated data types. An example of such a type is cdxcore.verbose.Context which is used to print hierarchical output messages. Types can be globally excluded using a cdxcore.subdir.CacheController when calling cdxcore.subdir.SubDir.

Numpy/Pandas

Numpy/Panda data should not be hashed for identifying unique call IDs. Instead, use the defining characteristics for generating the data frames.

For example:

from cdxcore.pretty import PrettyObject
from cdxcore.subdir import SubDir
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@cache.cache("0.1")
def load_src( src_def ):
    result = ... load ...
    return result

# ignore 'src_result'. It is uniquely identified by 'src_def' -->
@cache.cache("0.1", dependencies=[load_src], exclude_args=['data'])  
def statistics( stats_def, src_def, data ):
    stats = ... using data
    return stats

src_def = PrettyObject()
src_def.start = "2010-01-01"
src_def.end = "2025-01-01"
src_def.x = 0.1

stats_def = PrettyObject()
stats_def.lambda = 0.1
stats_def.window = 100

data  = load_src( src_def )
stats = statistics( stats_def, src_def, data )

While instructive, this case is not optimal: we do not really need to load data if we can reconstruct stats from data (unless we need data further on).

Consider therefore:

@cache.cache("0.1")
def load_src( src_def ):
    result = ... load ...
    return result

# ignore 'src_result'. It is uniquely identified by 'src_def' -->
@cache.cache("0.1", dependencies=[load_src])  
def statistics_only( stats_def, src_def ):
    data  = load_src( src_def )    # <-- embedd call to load_src() here
    stats = ... using src_result
    return stats

stats = statistics_only( stats_def, src_def )

Caching Member Functions#

You can cache member functions like any other function. Note that cdxcore.version.version() information are by default inherited, i.e. member functions will be dependent on the version of their defining class, and class versions will be dependent on their base classes’ versions:

from cdxcore.subdir import SubDir, version
cache   = SubDir("!/.cache")
cache.delete_all_content()   # for illustration

@version("0.1")
class A(object):
    def __init__(self, x):
        self.x = x

    @cache.cache("0.1")
    def f(self, y):
        return self.x*y

a = A(x=1)
_ = a.f(y=1)   # compute f and store result
_ = a.f(y=1)   # load result back from disk
a.x = 2
_ = a.f(y=1)   # 'a' changed: compute f and store result
b = A(x=2)
_ = b.f(y=1)   # same unique call ID as previous call
               # -> restore result from disk

WARNING:

cdxcore.uniquehash.UniqueHash does not by default process members of objects or dictionaries which start with a “_”. This behaviour can be changed using cdxcore.subdir.CacheController. For reasonably complex objects it is recommended to implement for your objects the a custom hashing function:

__unique_hash__( self, uniqueHash : UniqueHash, debug_trace : DebugTrace  )

This function is described at cdxcore.uniquehash.UniqueHash.

Caching Bound Member Functions#

Caching bound member functions is technically quite different to caching a function of a class in general, but also supported:

from cdxcore.subdir import SubDir, version
cache   = SubDir("!/.cache", cache_controller =
                 CacheController(debug_verbose=Context("all")))
cache.delete_all_content()   # for illustration

class A(object):
    def __init__(self,x):
        self.x = x
    def f(self,y):
        return self.x*y

a = A(x=1)
f = cache.cache("0.1", uid=lambda self, y : f"a.f({y})")(a.f)  # <- decorate bound 'f'.
r = c(y=2)

In this case the function f is bound to a. The object is added as self to the function parameter list even though the bound function parameter list does not include self. This, together with the comments on hashing objects above, ensures that (hashed) changes to a will be reflected in the unique call ID for the member function.

Managing Caching Accross a Project#

For project-wide use it is usually convenient to control caching at the level of a project-wide cache root directory. The classs cdxcore.subdir.VersionedCacheRoot is a thin convenience wrapper around a cdxcore.subdir.SubDir with a cdxcore.subdir.CacheController.

The idea is to have a central file, cache.py which contains the central root for caching. We recommend using an environment variable to be able to control the location of this directory out side the code. Here is an example with an environment variable PROJECT_CACHE_DIR:

# file cache.py

from cdxcore.subdir import VersionedCacheRoot
import os as os

cache_root = VersionedCacheRoot(
                   os.getenv("PROJECT_CACHE_DIR", "!/.cache")
                   )

In a particular project file, say pipeline.py create a file-local cache directory and use it:

# file pipeline.py

from cache import cache_root

cache_dir = cache_root("pipeline")

@cache_dir.cache("0.1")
def f(x):
    return x+2

@cache_dir.cache("0.1", dependencies=[f])
def g(x)
    return f(x)**2

# ...

In case you have issues with caching you can use the central root directory to turn on tracing accross your project:

 from cdxcore.verbose import Context
 cache_root = VersionedCacheRoot(
                    os.getenv("PROJECT_CACHE_DIR", "!/.cache"),
                    debug_verbose=Context.all    # turn full traing on
                 )
Parameters:
versionstr | None, default None

Version of the function.

dependencieslist[type] | None, default None

A list of version dependencies, either by reference or by name. See cdxcore.version.version() for details on name lookup if strings are used.

labelstr | Callable | None, default None

Specify a human-readable label for the function call given its parameters.

This label is used to generate the cache file name, and is also printed in when tracing hashing operations. Labels are not assumed to be unique, hence a unique hash of the label and the parameters to this function will be appended to generate the actual cache file name.

Use uid instead if label represents valid unique filenames. You cannot specify both uid and label. If neither uid and label are present, name will be used.

A label can start with a directory, i.e. lablel : lambda x, y : f"x/y" is a valid pattern.

Usage:

  • If label is a Callable then label( func_name=name, **parameters ) will be called to generate the actual label.

    The parameter func_name refers to the qualified name of the function. Its value can be overwitten by name, while the parameter name itself can be overwritten using name_of_func_name_arg, see below.

  • If label is a plain string without {} formatting: use this string as-is.

  • If label is a string with {} formatting, then label.format( func_name=name, **parameters ) will be used to generate the actual label.

    The parameter func_name refers to the qualified name of the function. Its value can be overwitten by name, while the parameter name itself can be overwritten using name_of_func_name_arg, see below.

See above for examples.

label cannot be used alongside uid.

uidstr | Callable | None, default None

Alternative to label which is assumed to generate a unique cache file name. It has the same semantics as label. When used, parameters to the decorated function are not hashed as the uid is assumed to be already unique. The string must be a valid file name

A uid can start with a directory.

Use label if the id is not unique. You cannot specify both uid and label. If neither uid and label are present, name will be used (as non-unique label).

namestr | None, default None

Name of this function which is used either on its own if neither label not uid are used, or which passed as a parameter func_name to either the callable or the formatting operator. See above for more details.

If name is not specified it defaults to __qualname__ expanded by the module name the function is defined in.

include_argslist[str] | None, default None

List of arguments to include in generating an unqiue ID, or None for all.

exclude_argslist[str] | None, default None

List of arguments to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements.

exclude_arg_typeslist[type | str] | None, default None

List of parameter types or names of type to exclude from generating an unique ID. Examples of such non-functional arguments are workflow controls (debugging) and i/o elements. Strings are compated to type(arg).__name__.

in_sub_dirstr | Callable | None, default None

Allows specifying a sub-directory for the cached files, using the same formatting logic as for label.

version_auto_classbool, default True

Whether to automaticallty add version dependencies on base classes or, for member functions, on containing classes. This is the auto_class parameter for cdxcore.version.version().

name_of_func_name_argstr, default "func_name"

When formatting label or uid, by default "func_name" is used to refer to the current function name. If there is already a parameter func_name for the function, an error will be raised. Use this flag to change the parameter name. Example:

from cdxcore.subdir import SubDir
cache = SubDir("?/temp")

@cache.cache("0.1")
def f( func_name, x ):
    pass

f("test", 1)

Generates a RuntimeError f@__main__: 'func_name' is a reserved keyword and used as formatting parameter name for the function name. Found it also in the function parameter list. Use 'name_of_name_arg' to change the internal parameter name used..

Instead, use:

@cache.cache("0.1", x : f"{new_func_name}(): {func_name} {x}", 
                    name_of_func_name_arg="new_func_name")
def f( func_name, x ):
    pass
Returns:
Decorated F: Callable

A decorated F whose __call__ implements the cached call to F.

This decorator has a member cache_info of type cdxcore.subdir.CacheInfo which can be used to access information on caching activity.

  • Information available at any time after decoration:

    • F.cache_info.name : qualified name of the function

    • F.cache_info.signature : signature of the function

  • Additonal information available during a call to a decorated function F, and thereafter (these proprties are not thread-safe):

    • F.cache_info.version : unique version string reflecting all dependencies.

    • F.cache_info.filename : unique filename used for caching logic during the last function call.

    • F.cache_info.label : last label generated, or None.

    • F.cache_info.arguments : arguments parsed to create a unique call ID, or None.

  • Additonal information available after a call to F (these proprties are not thread-safe):

    • F.cache_info.last_cached : whether the last function call returned a cached object.

The decorated F() has additional function parameters, namely:

  • override_cache_mode : CacheMode | None, default None

    Allows overriding the CacheMode temporarily, in particular you can set it to "off".

  • track_cached_files : cdxcore.subdir.CacheTracker | None, default None

    Allows passing a cdxcore.subdir.CacheTracker object to keep track of all files used (loaded from or saved to). The function cdxcore.subdir.CacheTracker.delete_cache_files() can be used to delete all files involved in caching.

  • return_cache_uid : bool, default False

    If True, then the decorated function will return a tuple uid, result where uid is the unique filename generated for this function call, and where result is the actual result from the function, cached or not. This uid is thread-safe.

    Usage:

    from cdxcore.subdir import SubDir
    cache_dir = SubDir("!/.cache")
    
    @cache_dir.cache()
    def f(x, y):
        return x*y
    
    uid, xy = f( x=1, y=2, return_cache_uid=True )
    

    This pattern is thread-safe when compared to using:

    xy = f( x=1, y=2 )
    uid = f.cache_info.filename
    
property cache_controller: CacheController#

Returns an assigned cdxcore.subdir.CacheController, or None

property cache_mode: CacheMode#

Returns the cdxcore.subdir.CacheMode associated with the underlying cache controller

create_directory()[source]#

Creates the current directory if it doesn’t exist yet. Returns self.

delete(file, raise_on_error=False, *, ext=None)[source]#

Deletes file.

This function will quietly fail if file does not exist unless raise_on_error is set to True.

Parameters:
file

filename, or list of filenames

raise_on_errorbool, default False

If False, do not throw KeyError if file does not exist or another error occurs.

extstr | None, default None

Extension, or list thereof if file is a list.

Use

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

delete_all_content(delete_self=False, raise_on_error=False, *, ext=None)[source]#

Deletes all valid keys and subdirectories in this sub directory.

Does not delete files with other extensions. Use cdxcore.subdir.SubDir.delete_everything() if the aim is to delete, well, everything.

Parameters:
delete_self: bool

Whether to delete the directory itself as well, or only its contents. If True, the current object will be left in None state.

raise_on_error: bool

False for silent failure

extstr | None, default None

Extension for keys, or None for the directory’s default. Use "" to match all files regardless of extension.

delete_all_files(raise_on_error=False, *, ext=None)[source]#

Deletes all valid keys in this sub directory with the correct extension.

Parameters:
raise_on_errorbool

Set to False to quietly ignore errors.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

delete_everything(keep_directory=True)[source]#

Deletes the entire sub directory will all contents.

WARNING: deletes all files and sub-directories, not just those with the present extension. If keep_directory is False, then the directory referred to by this object will also be deleted.

In this case, self will be set to None state.

property existing_path: str#

Return current path, including training '/'.

existing_path ensures that the directory structure exists (or raises an exception). Use cdxcore.subdir.SubDir.path() if creation on the fly is not desired.

exists(file, *, ext=None)[source]#

Checks whether a file exists.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Statusbool

If file is a string, returns True or False, else it will return a list of bool values.

static expand_std_root(name)[source]#

Expands name by a standardized root directory.

The first character of name can be either of:

If neither of these matches the first character, name is returned as is.

This function does not support "?" because "?" used in the constructor represents a new directory every time it is used.

This function returns a string.

property ext: str#

Returns the common extension of the files in this directory, including leading '.'. Resolves "*" into the extension associated with the current cdxcore.subdir.Format.

file_size(file, *, ext=None)[source]#

Returns the file size of a file.

See comments on os.path.getatime() for system compatibility information.

Parameters:
filestr

Filename, or list of filenames.

extstr

Extension, or list thereof if file is an extension.

  • Use None for the directory default.

  • Use "" for no automatic extension.

Returns:
File size if file, or None if an error occurred.
files(*, ext=None)[source]#

Returns a list of files in this subdirectory with the current extension, or the specified extension.

In other words, if the extension is “.pck”, and the files are “file1.pck”, “file2.pck”, “file3.bin” then this function will return [ “file1”, “file2” ]

If ext is:

  • None, then the directory’s default extension will be used.

  • "" then this function will return all files in this directory.

  • "*" then the extension corresponding to the current format will be used.

This function ignores directories. Use cdxcore.subdir.SubDir.sub_dirs() to retrieve those.

property fmt: Format#

Returns current cdxcore.subdir.Format.

full_file_name(file, *, ext=None)[source]#

Returns fully qualified file name, based on a given unqualified file name (e.g. without path or extension).

Parameters:
filestr

Core file name without path or extension.

extstr | None, default None

If not None, use this extension rather than cdxcore.subdir.SubDir.ext.

Returns:
Filenamestr | None

Fully qualified system file name. If self is None, then this function returns None; if file is None then this function also returns None.

full_temp_file_name(file=None, *, ext=None, create_directory=False)[source]#

Returns a fully qualified unique temporary file name with path and extension

The file name is generated by applying a unique hash to the current directory, file, the current process and thread IDs, and datetime.datetime.now().

If file is not None it will be used as a label.

This function returns the fully qualified file name. Use cdxcore.subdir.SubDir.temp_file_name() to only a file name.

Parameters:
filestr | None, default None

An optional file. If provided, cdxcore.uniquehash.named_unique_filename48_8() is used to generate the temporary file which means that a portion of file will head the returned temporary name.

If file is None, cdxcore.uniquehash.unique_hash48() is used to generate a 48 character hash.

extstr | None, default None

Extension to use, or None for the extrension of self.

Returns:
Temporary file namestr

The fully qualified file name.

get_creation_time(file, *, ext=None)[source]#

Returns the creation time of a file.

See comments on os.path.getctime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occurred.

get_last_access_time(file, *, ext=None)[source]#

Returns the last access time of a file.

See comments on os.path.getatime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occurred.

get_last_modification_time(file, *, ext=None)[source]#

Returns the last modification time a file.

See comments on os.path.getmtime() for system compatibility information.

Parameters:
file

Filename, or list of filenames.

extstr | None, default None

Extension to be used:

  • None for the directory default.

  • "" to not use an automatic extension.

  • "*" to use the extension associated with the format of the directory.

Returns:
Datetimedatetime.datetime

A single datetime if file is a string, otherwise a list of datetime’s. Returns None if an error occurred.

get_version(file, raise_on_error=False, *, ext=None, fmt=None)[source]#

Returns a version stored in a file.

This requires that the file has previously been saved with a version. Otherwise this function will have unpredictable results.

Parameters:
filestr

A filename, or a list thereof.

raise_on_errorbool

Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Set to:

  • None to use directory’s default.

  • "*" to use the extension implied by fmt.

  • "" for no extension.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is.

Returns:
versionstr

The version.

property is_none: bool#

Whether this object is None or not. For such SubDir object no files exists, and writing any file will fail.

is_version(file, version=None, raise_on_error=False, *, ext=None, fmt=None, delete_wrong_version=True)[source]#

Tests the version of a file.

Parameters:
filestr

A filename, or a list thereof.

versionstr

Specifies the version to compare the file’s version with.

You can use "*" to match any version.

raise_on_errorbool

Whether to raise an exception if accessing an existing file failed (e.g. if it is a directory). By default this function fails silently and returns the default.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Set to:

  • None to use directory’s default.

  • "*" to use the extension implied by fmt.

  • "" for no extension.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is.

Returns:
Statusbool

Returns True only if the file exists, has version information, and its version is equal to version.

items(*, ext=None, raise_on_error=False)[source]#

Dictionary-style iterable of filenames and their content.

Usage:

subdir = SubDir("!")
for file, data in subdir.items():
    print( file, str(data)[:100] )
Parameters:
extstr | None, default None

Extension or None for the directory’s current extension. Use "" for all file extension.

Returns:
Iterable

An iterable generator

property path: str#

Return current path, including trailing '/'.

Note that the path may not exist yet. If existence is required, consider using cdxcore.subdir.SubDir.existing_path().

path_exists()[source]#

Whether the current directory exists

read(file, default=None, raise_on_error=False, *, version=None, delete_wrong_version=True, ext=None, fmt=None)[source]#

Read data from a file if the file exists, or return default.

  • Supports file containing directory information.

  • Supports file (and default``as well as ``ext) being iterable. Examples:

    from cdxcore.subdir import SubDir
    files = ['file1', 'file2']
    sd = SubDir("!/test")
    
    sd.read( files )          # both files are using default None
    sd.read( files, 1 )       # both files are using default '1'
    sd.read( files, [1,2] )   # files use defaults 1 and 2, respectively
    
    sd.read( files, [1] )      # produces error as len(keys) != len([1])
    

    Strings are iterable but are treated as single value. Therefore:

    sd.read( files, '12' )      # the default value '12' is used for both files
    sd.read( files, ['1','2'] ) # use defaults '1' and '2', respectively
    
Parameters:
filestr

A file name or a list thereof. file may contain subdirectories.

default

Default value, or default values if file is a list.

raise_on_errorbool, default False

Whether to raise an exception if reading an existing file failed. By default this function fails silently and returns the default.

versionstr | None, default None

If not None, specifies the version of the current code base.

In this case, this version will be compared to the version of the file being read. If they do not match, read fails (either by returning default or throwing a cdxcore.version.VersionError exception).

You can specify version "*" to accept any version. Note that this is distinct to using None which stipulates that the file should not have version information.

delete_wrong_versionbool, default True

If True, and if a wrong version was found, delete the file.

extstr | None, default None

Extension overwrite, or a list thereof if file is a list.

Use:

  • None to use directory’s default.

  • '*' to use the extension implied by fmt.

  • "" to turn of extension management.

fmtcdxcore.subdir.Format | None, default None

File cdxcore.subdir.Format or None to use the directory’s default.

Note:

  • fmt cannot be a list even if file is.

  • Unless ext or the SubDir’s extension is '*', changing the format does not automatically change the extension.

Returns:
Contenttype | list

For a single file returns the content of the file if successfully read, or default otherwise. If file is a list, this function returns a list of contents.

Raises:
Version errorcdxcore.version.VersionError:

If the file’s version did not match the version provided.

Version presentcdxcore.subdir.VersionPresentError:

When attempting to read a file without version which has a version this exception is raised.

I/O errorsException

Various standard I/O errors are raisedas usual.

read_string(file, default=None, raise_on_error=False, *, ext=None)[source]#

Reads text from a file. Removes trailing EOLs.

Returns the read string, or a list of strings if file was iterable.

static remove_bad_file_characters(file, by='default')[source]#

Replaces invalid characters in a filename using the map by.

See cdxcore.util.fmt_filename() for documentation and further options.

rename(source, target, *, ext=None)[source]#

Rename a file.

This function will raise an exception if not successful.

Parameters:
source, targetstr

Filenames.

extstr

Extension.

  • Use None for the directory default.

  • Use "" for no automatic extension.

sub_dirs()[source]#

Retrieve a list of all sub directories.

If self does not refer to an existing directory, then this function returns an empty list.

static temp_dir()[source]#

Return system temp directory. Short-cut to tempfile.gettempdir().

This function creates a “permanent temporary” directoy (i.e. under /tmp/ for Linux or %TEMP% for Windows). Most importantly, it is somewhat persisient: you expect it to be there after a reboot.

To cater for the use case of a one-off temporary directory use cdxcore.subdir.SubDir.temp_temp_dir().

This function is called when the ! parameter is used when constructing cdxcore.subdir.SubDir objects.

Returns:
Pathstr

This function returns a string contains trailing '/'.

temp_file_name(file=None)[source]#

Returns a unique temporary file name.

The file name is generated by applying a unique hash to the current directory, file, the current process and thread IDs, and datetime.datetime.now().

If file is not None it will be used as a label.

This function returns just the file name. Use cdxcore.subdir.SubDir.full_temp_file_name() to get a full temporary file name including path and extension.

Parameters:
filestr | None, default None

An optional file. If provided, cdxcore.uniquehash.named_unique_filename48_8() is used to generate the temporary file which means that a portion of file will head the returned temporary name.

If file is None, cdxcore.uniquehash.unique_hash48() is used to generate a 48 character hash.

Returns:
Temporary file namestr

The file name.

static temp_temp_dir()[source]#

Returns a temporary temp directory name using tempfile.mkdtemp() which is temporary for the current process and thread, and is not guaranteed to be persisted e.g. when the system is rebooted. Accordingly, this function will return a different directory upon every function call.

This function is called when the ?/ is used when constructing cdxcore.subdir.SubDir objects.

Implementation notoce:

In most cirsumstances, a temporary temp directioy is not deleted from a system upon reboot. Do not rely on regular clean ups. It is strongly recommended to clean up after usage, for example using the pattern:

from cdxcore.subdir import SubDir
import shutil

try:
    tmp_dir = SubDir.temp_temp_dir()

    ...
finally:
    shutil.rmtree(tmp_dir)
Returns:
Pathstr

This function returns a string contains trailing '/'.

static user_dir()[source]#

Return current working directory. Short-cut for os.path.expanduser() with parameter ' '.

This function is called when the ~/ is used when constructing cdxcore.subdir.SubDir objects.

Returns:
Pathstr

This function returns a string contains trailing '/'.

static working_dir()[source]#

Return current working directory. Short-cut for os.getcwd().

This function is called when the ./ is used when constructing cdxcore.subdir.SubDir objects.

Returns:
Pathstr

This function returns a string contains trailing '/'.

write(file, obj, raise_on_error=True, *, version=None, ext=None, fmt=None)[source]#

Writes an object to file.

  • Supports file containing directories.

  • Supports file being a list. In this case, if obj is an iterable it is considered the list of values for the elements of file. If obj is not iterable, it will be written into all files from file:

    from cdxcore.subdir import SubDir
    
    keys = ['file1', 'file2']
    sd = SubDir("!/test")
    sd.write( keys, 1 )               # works, writes '1' in both files.
    sd.write( keys, [1,2] )           # works, writes 1 and 2, respectively
    sd.write( keys, "12" )            # works, writes '12' in both files
    sd.write( keys, [1] )             # produces error as len(keys) != len(obj)
    

If the current directory is None, then the function raises an EOFError exception.

Parameters:
filestr

Core filename, or list thereof.

obj

Object to write, or list thereof if file is a list.

raise_on_errorbool, default ``

If False, this function will return False upon failure.

versionstr | None, default None

If not None, specifies the version of the code which generated obj. This version will be written to the beginning of the file.

extstr | None, default None

Extension, or list thereof if file is a list.

  • Use None to use directory’s default extension.

  • Use "*" to use the extension implied by fmt.

fmtcdxcore.subdir.Format | None, default None

File format or None to use the directory’s default. Note that fmt cannot be a list even if file is. Note that unless ext or the SubDir’s extension is ‘*’, changing the format does not automatically change the extension used.

Returns:
Successbool

Boolean to indicate success if raise_on_error is False.

write_string(file, line, raise_on_error=True, *, ext=None)[source]#

Writes a line of text into a file.

  • Supports file` containing directories.

  • Supports file` being a list. In this case, line can either be the same value for all file’s or a list, too.

If the current directory is None, then the function throws an EOFError exception

exception cdxcore.subdir.VersionPresentError[source]#

Bases: RuntimeError

Exception raised in case a file was read which had a version, but no test version was provided.

cdxcore.subdir.VersionedCacheRoot(directory, *, ext=None, fmt=None, create_directory=False, **controller_kwargs)[source]#

Create a root directory for versioned caching on disk using cdxcore.subdir.SubDir.cache().

Usage:

In a central file, define a root directory for all caching activity:

from cdxcore.subdir import VersionedCacheRoot
vroot = VersionedCacheRoot("!/cache")

Create sub-directories as suitable, for example:

vtest = vroot("test")

Use these for caching:

@vtest.cache("1.0")
def f1( x=1, y=2 ):
    print(x,y)

@vtest.cache("1.0", dps=[f1])
def f2( x=1, y=2, z=3 ):
    f1( x,y )
    print(z)
Parameters:
directorystr

Name of the root directory for caching.

Using SubDir the following Short-cuts are supported:

  • "!/dir" creates dir in the temporary directory.

  • "~/dir" creates dir in the home directory.

  • "./dir" creates dir relative to the current directory.

extstr | None, default None

Extension, which will automatically be appended to file names. The default value depends on fmt`; for ``Format.PICKLE it is “pck”.

fmtcdxcore.subdir.Format | None, default None

File format; if ext is not specified, the format drives the extension, too. The default None becomes Format.PICKLE.

create_directorybool, default False

Whether to create the directory upon creation.

controller_kwargs: dict

Parameters passed to cdxcore.subdir.CacheController`.

Common parameters used:

  • exclude_arg_types: list of types or names of types to exclude when auto-generating function signatures from function arguments. An example is cdxcore.verbose.Context which is used to print progress messages.

  • max_filename_length: maximum filename length.

  • hash_length: length used for hashes, see cdxcore.uniquehash.UniqueHash.

  • debug_verbose set to Context.all after importing from cdxcore.verbose import Context will turn on tracing all caching operations.

Returns:
Rootcdxcore.subdir.SubDir

A root directory suitable for caching.