cdxcore.config#

Tooling for setting up program-wide configuration hierachies. Aimed at machine learning programs to ensure consistency of code accross experimentation.

Overview#

Basic config construction:

from cdxbasics.config import Config, Int
config = Config()
config.num_batches = 1000    # object-like assigment of config values
config.network.depth = 3     # on-the-fly hierarchy generation: here `network` becomes a sub-config
config.network.width = 100    
...

def train(config):
    num_batches = config("num_batches", 10, Int>=2, "Number of batches. Must be at least 2")
    ...

Key features#

  • Detect misspelled parameters by checking that all parameters provided via a config by a user have been read.

  • Provide summary of all parameters used, including summary help for what they were for.

  • Nicer object attribute synthax than dictionary notation, in particular for nested configurations.

  • Automatic conversion including simple value validation to ensure user-provided values are within a given range or from a list of options.

Creating Configs#

Set data with both dictionary and member notation:

config = Config()
config['features']           = [ 'time', 'spot' ]   # examplearray-type assignment
config.scaling               = [ 1., 1000. ]        # example object-type assignment

Reading a Config#

When reading the value for a key from a config, cdxcore.config.Config.__call__() expects a key, a default value, a cast type, and a brief help text. The function first attempts to find key in the provided Config:

  • If key is found, it casts the value provided for key using the cast type and returns.

  • If key is not found, then the default value will be returned (after also being cast using cast).

Example:

from cdxcore.config import Config
import numpy as np

class Model(object):
    def __init__( self, config ):
        # read top level parameters
        self.features = config("features", [], list, "Features for the agent" )
        self.scaling  = config("scaling", [], np.asarray, "Scaling for the features", help_default="no scaling")

model = Model( config )

Most of the example is self-explanatory, but note that the :class:’numpy.asarray` provided as cast parameter for weights means that any values passed by the user will be automatically converted to numpy.ndarray objects.

The help text parameter allows providing information on what variables are read from the config. The latter can be displayed using the function cdxcore.config.Config.usage_report(). (There a number of further parameters to cdxcore.config.Config.__call__() to fine-tune this report such as the help_defaults parameter used above).

In the above example, print( config.usage_report() ) will return:

config['features'] = ['time', 'spot'] # Features for the agent; default: []
config['scaling'] = [   1. 1000.] # Weigths for the agent; default: no initial weights

Sub-Configs#

You can write and read sub-configurations directly with member notation, without having to explicitly create an entry for the sub-config:

Assume as before:

config = Config()
config['features']           = [ 'time', 'spot' ]   
config.scaling               = [ 1., 1000. ]        

Then create a network sub configuration with member notation on the fly:

config.network.depth         = 10
config.network.width         = 100
config.network.activation    = 'relu'

This is equivalent to:

config.network               = Config()
config.network.depth         = 10
config.network.width         = 100
config.network.activation    = 'relu'

Now use naturally as follows:

from cdxcore.config import Config
import numpy as np

class Network(object):
    def __init__( self, config ):
        self.depth      = config("depth", 1, Int>0, "Depth of the network")
        self.width      = config("width", 1, Int>0, "Width of the network")
        self.activation = config("activation", "selu", str, "Activation function")
        config.done() # see below

class Model(object):
    def __init__( self, config ):
        # read top level parameters
        self.features = config("features", [], list, "Features for the agent" )
        self.weights  = config("weights", [], np.asarray, "Weigths for the agent", help_default="no initial weights")
        self.networks = Network( config.network )
        config.done() # see below

model = Model( config )

Imposing Simple Restrictions on Values#

The cast parameter to cdxcore.config.Config.__call__() is a callable; this allows imposing simple restrictions to any values read from a config. To this end, import the respective type operators:

from cdxcore.config import Int, Float

Implement a one-sided restriction:

# example enforcing simple conditions
self.width = network('width', 100, Int>3, "Width for the network")

Restrictions on both sides of a scalar:

# example encorcing two-sided conditions
self.percentage = network('percentage', 0.5, ( Float >= 0. ) & ( Float <= 1.), "A percentage")

Enforce the value being a member of a list:

# example ensuring a returned type is from a list
self.ntype = network('ntype', 'fastforward', ['fastforward','recurrent','lstm'], "Type of network")

We can allow a returned value to be one of several casting types by using tuples. The most common use case is that None is a valid value, too. For example, assume that the name of the network model should be a string or None. This is implemented as:

# example allowing either None or a string
self.keras_name = network('name', None, (None, str), "Keras name of the network model")

We can combine conditional expressions with the tuple notation:

# example allowing either None or a positive int
self.batch_size = network('batch_size', None, (None, Int>0), "Batch size or None for TensorFlow's default 32", help_cast="Positive integer, or None")

Ensuring that we had no Typos & that all provided Data is meaningful#

A common issue when using dictionary-based code configuration is that we might misspell one of the parameters. Unless this is a mandatory parameter we might not notice that we have not actually changed its value.

To check that all values of a config were read use cdxcore.config.Config.done(). It will alert you if there are keywords or children which have not been read. Most likely, those will be typos. Consider the following example where width is misspelled in our config:

class Network(object):
    def __init__( self, config ):
        # read top level parameters
        self.depth     = config("depth", 1, Int>=1, "Depth of the network")
        self.width     = config("width", 3, Int>=1, "Width of the network")
        self.activaton = config("activation", "relu", help="Activation function", help_cast="String with the function name, or function")
        config.done() # <-- test that all members of config where read

config                       = Config()
config.features              = ['time', 'spot']
config.network.depth         = 10
config.network.activation    = 'relu'
config.network.widht         = 100   # (intentional typo)

n = Network(config.network)

Since width was misspelled in setting up the config, a cdxcore.config.NotDoneError exception is raised:

NotDoneError: Error closing Config 'config.network': the following config arguments were not read: widht

Summary of all variables read from this object:
config.network['activation'] = relu # Activation function; default: relu
config.network['depth'] = 10 # Depth of the network; default: 1
config.network['width'] = 3 # Width of the network; default: 3

Note that you can also call cdxcore.config.Config.done() at top level:

class Network(object):
    def __init__( self, config ):
        # read top level parameters
        self.depth     = config("depth", 1, Int>=1, "Depth of the network")
        self.width     = config("width", 3, Int>=1, "Width of the network")
        self.activaton = config("activation", "relu", help="Activation function", help_cast="String with the function name, or function")

config                       = Config()
config.features              = ['time', 'spot']
config.network.depth         = 10
config.network.activation    = 'relu'
config.network.widht         = 100   # (intentional typo)

n = Network(config.network)
test_features = config("features", [], list, "Features for my network")
config.done()

produces:

NotDoneError: Error closing Config 'config.network': the following config arguments were not read: widht

Summary of all variables read from this object:
config.network['activation'] = relu # Activation function; default: relu
config.network['depth'] = 10 # Depth of the network; default: 1
config.network['width'] = 3 # Width of the network; default: 3
# 
config['features'] = ['time', 'spot'] # Features for my network; default: []

You can check the status of the use of the config by using the cdxcore.config.Config.not_done property.

Detaching Child Configs#

You can also detach a child config, which allows you to store it for later use without triggering cdxcore.config.Config.done() errors:

def read_config(  self, confg ):
    ...
    self.config_training = config.training.detach()
    config.done()

The function cdxcore.config.Config.detach() will mark he original child but not the detached child itself as ‘done’. Therefore, we will need to call cdxcore.config.Config.done() for the detached child when we finished processing it:

def training(self):
    epochs     = self.config_training("epochs", 100, int, "Epochs for training")
    batch_size = self.config_training("batch_size", None, help="Batch size. Use None for default of 32" )

    self.config_training.done()

Various Copy Operations#

When making a copy of a config we will need to decide about the semantics of the operation. A cdxcore.config.Config object contains

Accordingly, when making a copy of self we need to determine the relationship of the copy with above.

  • cdxcore.config.Config.detach(): use case is deferring usage of a config to a later point.

    • Done status: self is marked as “done”; the copy is used keep track of usage of the remaining parameters.

    • Consistency: both self and the copy share the same consistency recorder.

  • cdxcore.config.Config.copy(): make an indepedent copy of the current status of self.

    • Done status: the copy has an inpendent copy of the “done” status of self.

    • Consistency: the copy has an inpendent copy of the consistency recorder of self.

  • cdxcore.config.Config.clean_copy(): make an indepedent copy of self, and reset all usage information.

    • Done status: the copy has an empty “done” status.

    • Consistency: the copy has an empty consistency recorder.

  • cdxcore.config.Config.shallow_copy(): make a shallow copy which shares all future usage tracking with self.

    The copy acts as a view on self. This is the semantic of the copy constructor.

    • Done status: the copy and self share all “done” status; if a parameter is read with one, it is considered “done” by both.

    • Consistency: the copy and self share all consistency handling. If a parameter is read with one with a given default and help, the other must use the same values when accessing the same parameter.

Self-Recording All Available Configuration Parameters#

Once your program ran, you can read the summary of all values read, their defaults, and their help texts:

print( config.usage_report( with_cast=True ) )

Prints:

config.network['activation'] = relu # (str) Activation function for the network; default: relu
config.network['depth'] = 10 # (int) Depth for the network; default: 10000
config.network['width'] = 100 # (int>3) Width for the network; default: 100
config.network['percentage'] = 0.5 # (float>=0. and float<=1.) Width for the network; default: 0.5
config.network['ntype'] = 'fastforward' # (['fastforward','recurrent','lstm']) Type of network; default 'fastforward'
config.training['batch_size'] = None # () Batch size. Use None for default of 32; default: None
config.training['epochs'] = 100 # (int) Epochs for training; default: 100
config['features'] = ['time', 'spot'] # (list) Features for the agent; default: []
config['weights'] = [1 2 3] # (asarray) Weigths for the agent; default: no initial weights

Unique Hash#

Another common use case is that we wish to cache the result of some complex operation. Assuming that the config describes all relevant parameters, and is therefore a valid ID for the data we wish to cache, we can use cdxcore.config.Config.unique_hash() to obtain a unique hash ID for the given config.

cdxcore.config.Config also implements the custom hashing protocol __unique_hash__ defined by cdxcore.uniquehash.UniqueHash, which means that if a Config is used during a hashing function from cdxcore.uniquehash the config will be hashed correctly.

A fully transparent caching framework which supports code versioning and transparent hashing of function parameters is implemented with cdxcore.subdir.SubDir.cache().

Consistent ** kwargs Handling#

The Config class can be used to improve ** kwargs handling. Assume we have:

def f(** kwargs):
    a = kwargs.get("difficult_name", 10)
    b = kwargs.get("b", 20)

We run the usual risk of a user mispronouncing a parameter name which we would never know. Therefore we may improve upon the above with:

def f(**kwargs):
    kwargs = Config(kwargs)
    a = kwargs("difficult_name", 10)
    b = kwargs("b", 20)
    kwargs.done()

If now a user calls f with, say, config(difficlt_name=5) an error will be raised.

A more advanced pattern is to allow both config and kwargs function parameters. In this case, the user can both provide a config or specify its parameters directory:

def f( config=None, **kwargs):
    config = Config.config_kwargs(config,kwargs)
    a = config("difficult_name", 10, int)
    b = config("b", 20, int)
    config.done()

Any of the following function calls are now valid:

f( Config(difficult_name=11, b=21) )        # use a Config
f( difficult_name=12, b=22 )                # use a kwargs
f( Config(difficult_name=11, b=21), b=22 )  # use both; kwargs overwrite config values

Dataclasses#

dataclasses rely on default values of any member being “frozen” objects, which most user-defined objects and cdxcore.config.Config objects are not. This limitation applies as well to flax modules. To use non-frozen default values, use the cdxcore.config.Config.as_field() function:

from cdxcore.config import Config
from dataclasses import dataclass

@dataclass
class Data:
    data : Config = Config().as_field()

    def f(self):
        return self.data("x", 1, Int>0, "A positive integer")

d = Data()   # default constructor used.
d.f()

Import#

from cdxcore.config import Config

Documentation#

Module Attributes

no_default

Value indicating no default is available for a given parameter.

Float

Allows to apply basic range conditions to float parameters.

Int

Allows to apply basic range conditions to int parameters.

Functions

config_kwargs(config, kwargs[, config_name])

Default implementation for a usage pattern where the user can provide both a cdxcore.config.Config parameter and ** kwargs.

to_config(kwargs[, config_name])

Assess whether a parameters is a cdxcore.config.Config, and otherwise tries to convert it into one.

Classes

Config(*args[, config_name])

A simple Config class for hierarchical dictionary-like configurations but with type checking, detecting missspelled parameters, and simple built-in help.

Exceptions

CastError(key, config_name, exception)

Raised when cdxcore.config.Config.__call__() could not cast a value provided by the user to the specified type.

InconsistencyError(key, config_name, message)

Raised when cdxcore.config.Config.__call__() used inconsistently between function calls for a given parameter.

NotDoneError(not_done, config_name, message)

Raised when cdxcore.config.Config.done() finds that some config parameters have not been read.

exception cdxcore.config.CastError(key, config_name, exception)[source]#

Bases: RuntimeError

Raised when cdxcore.config.Config.__call__() could not cast a value provided by the user to the specified type.

Parameters:
keystr

Key name of the parameter which failed to cast.

config_namestr

Name of the Config.

exceptionException

Orginal exception raised by the cast.

config_name#

Hierarchical name of the config.

key#

Key of the parameter which failed to cast.

class cdxcore.config.Config(*args, config_name=None, **kwargs)[source]#

Bases: OrderedDict

A simple Config class for hierarchical dictionary-like configurations but with type checking, detecting missspelled parameters, and simple built-in help.

See cdxcore.config for an extensive discussion of features.

Parameters:
*argslist

List of Mapping to iteratively create a new config with.

If the first element is a Config, and no other parameters are passed, then this object will be a shallow copy of that Config. It then shares all usage recording. See cdxcore.config.Config.shallow_copy().

config_namestr, optional

Name of the configuration for report_usage. Default is "config".

** kwargsdict

Additional key/value pairs to initialize the config with, e.g.``Config(a=1, b=2)``.

Attributes:
children

Dictionary of the child configs of self.

config_name

Qualified name of this config.

is_empty

Whether any parameters have been set, at parent level or at any child level.

not_done

Returns a dictionary of keys which were not read yet.

recorder

Returns the “recorder”, a sortedcontainers.SortedDict which contains key, default, cast, help, and all other function parameters for all calls of cdxcore.config.Config.__call__().

Methods

__call__(key[, default, cast, help, ...])

Reads a parameter key from the config subject to casting with cast.

as_dict([mark_done])

Convert self into a dictionary of dictionaries.

as_field()

This function provides support for dataclasses.dataclass fields with Config default values.

clean_copy()

Make a copy of self, and reset it to the original input state from the user.

clear(/)

config_kwargs(config, kwargs[, config_name])

Default implementation for a usage pattern where the user can provide both a cdxcore.config.Config parameter and ** kwargs.

copy()

Return a fully independent copy of self.

delete_children(names)

Delete one or several children from self.

detach()

Returns a copy of self, and sets self to "done".

done([include_children, mark_done])

Closes the config and checks that no unread parameters remain.

fromkeys(/, iterable[, value])

Create a new ordered dictionary with keys from iterable and values set to value.

get(*kargs, **kwargs)

Returns cdxcore.config.Config.__call__() (*kargs, **kwargs).

get_default(*kargs, **kwargs)

Returns cdxcore.config.Config.__call__() (*kargs, **kwargs).

get_raw(key[, default])

Reads the raw value for key without any casting, nor marking the element as read, nor recording access to the element.

get_recorded(key)

Returns the casted value returned for key previously.

input_dict([ignore_underscore])

Returns a cdxcore.pretty.PrettyObject of all inputs into this config.

input_report([max_value_len])

Returns a report of all inputs in a readable format.

items(/)

Return a set-like object providing a view on the dict's items.

keys()

Returns the keys for the immediate parameters of this config.

mark_done([include_children])

Mark all members as "done" (having been used).

move_to_end(/, key[, last])

Move an existing element to the end (or beginning if last is false).

pop(/, key[, default])

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(/[, last])

Remove and return a (key, value) pair from the dictionary.

record_key(key)

Returns the fully qualified string key for key.

reset()

Reset all usage information.

popitem(/[, last])

Remove and return a (key, value) pair from the dictionary.

record_key(key)

Returns the fully qualified string key for key.

reset()

Reset all usage information.

reset_done()

Reset the internal list of which are "done" (used).

setdefault(/, key[, default])

Insert key with a value of default if key is not in the dictionary.

shallow_copy()

Return a shallow copy of self which shares all usage tracking with self going forward.

to_config(kwargs[, config_name])

Assess whether a parameters is a cdxcore.config.Config, and otherwise tries to convert it into one.

unique_hash(*[, unique_hash, debug_trace, ...])

Returns a unique hash key for this object - based on its provided inputs and not based on its usage.

update([other])

Overwrite values of 'self' new values. Accepts the two main formats::.

usage_report([with_values, with_help, ...])

Generate a human readable report of all variables read from this config.

usage_reproducer()

Returns a string representation of current usage, calling repr() for each value.

usage_value_dict()

Return a flat sorted dictionary of both "used" and, where not used, "input" values.

used_info(key)

Returns the usage stats for a given key in the form of a tuple (done, record).

values(/)

Return an object providing a view on the dict's values.

__call__(key, default=<cdxcore.config._ID object>, cast=None, help=None, help_default=None, help_cast=None, mark_done=True, record=True)[source]#

Reads a parameter key from the config subject to casting with cast. If not found, return default

Examples:

config("key")                      # returns the value for 'key' or if not found raises an exception
config("key", 1)                   # returns the value for 'key' or if not found returns 1
config("key", 1, int)              # if 'key' is not found, return 1. If it is found cast the result with int().
config("key", 1, int, "A number"   # also stores an optional help text.
                                   # Call usage_report() after the config has been read to a get a full
                                   # summary of all data requested from this config.

Use cdxcore.config.Int and cdxcore.config.Float to ensure a number is within a given range:

config("positive_int", 1, Int>=1, "A positive integer")
config("ranged_int", 1, (Int>=0)&(Int<=10), "An integer between 0 and 10, inclusive")
config("positive_float", 1, Float>0., "A positive integerg"

Choices are implemented with lists:

config("difficulty", 'easy', ['easy','medium','hard'], "Choose one")

Alternative types are implemented with tuples:

config("difficulty", None, (None, ['easy','medium','hard']), "None or a level of difficulty")
config("level", None, (None, Int>=0), "None or a non-negative level")
Parameters:
keystring

Keyword to read.

defaultoptional

Default value. Set to cdxcore.config.Config.no_default for mandatory parameters without default. If then ‘key’ cannot be found a KeyError is raised.

castCallable, optional

If None, any value provided by the user will be acceptable.

If not None, the function will attempt to cast the value provided by the user with cast(). For example, if cast = int, then the function will apply int(x) to the user’s input x.

This function also allows passing the following complex arguments:

  • A list, in which case it is assumed that the key must be from this list. The type of the first element of the list will be used to cast() values to the target type.

  • cdxcore.config.Int and cdxcore.config.Float allow defining constrained integers and floating point numbers, respectively.

  • A tuple of types, in which case any of the types is acceptable. A None here means that the value None is acceptabl (it does not mean that any value is acceptable).

  • Any callable to validate a parameter.

helpstr, optional

If provied adds a help text when self documentation is used.

help_defaultstr, optional

If provided, specifies the default value in plain text. If not provided, help_default is equal to the string representation of the default value, if any. Use this for complex default values which are hard to read.

help_caststr, optional

If provided, specifies a description of the cast type. If not provided, help_cast is set to the string representation of cast, or None if cast` is ``None. Complex casts are supported. Use this for cast types which are hard to read.

mark_donebool, optional

If true, marks the respective element as read once the function returned successfully.

recordbool, optional

If True, records consistency usage of the key and validates that previous usage of the key is consistent with the current usage, e.g. that the default values are consistent and that if help was provided it is the same.

Returns:
Parameter value.
Raises:
KeyError:

If key could not be found.

ValueError:

For input errors.

cdxcore.config.InconsistencyError:

If key was previously accessed with different default, help, help_default or help_cast values. For all the help texts empty strings are not compared, i.e. __call__("x", default=1) will succeed even if a previous call was __call__("x", default=1, help="value for x").

Note that cast is not validated.

cdxcore.config.CastError:

If an error occcurs casting a provided value.

as_dict(mark_done=True)[source]#

Convert self into a dictionary of dictionaries.

Parameters:
mark_donebool

If True, then all members of this config will be considered “done” upon return of this function.

Returns:
Dictdict

Dictionary of dictionaries.

as_field()[source]#

This function provides support for dataclasses.dataclass fields with Config default values.

When adding a field with a non-frozen default value to a @dataclass class, a default_factory has to be provided. The function as_field returns the corresponding dataclasses.Field element by returning simply:

def factory():
    return self
return dataclasses.field( default_factory=factory )

Usage is as follows:

from dataclasses import dataclass
@dataclass 
class A:
    data : Config = Config(x=2).as_field()

a = A() 
print(a.data['x'])  # -> "2"
a = A(data=Config(x=3)) 
print(a.data['x'])  # -> "3"
property children: OrderedDict#

Dictionary of the child configs of self.

clean_copy()[source]#

Make a copy of self, and reset it to the original input state from the user.

As an example, the following allows using different default values for config members of the same name:

base = Config()
_ = base('a', 1)   # read a with default 1

copy = base.copy() # copy will know 'a' as used with default 1
                   # 'b' was not used yet

_ = base('b', 111) # read 'b' with default 111
_ = copy('b', 222) # read 'b' with default 222 -> ok

_ = copy('a', 2)   # use 'a' with default 2 -> ok

Use cdxcore.config.Config.copy() for a making a copy which tracks prior usage information.

See also the summary on various copy operations in cdxcore.config.

static config_kwargs(config, kwargs, config_name='kwargs')[source]#

Default implementation for a usage pattern where the user can provide both a cdxcore.config.Config parameter and ** kwargs.

Example:

def f(config, **kwargs):
    config = Config.config_kwargs( config, kwargs )
    ...
    x = config("x", 1, ...)
    config.done() # <-- important to do this here. Remembert that config_kwargs() calls 'detach'

and then one can use either of the following:

f(Config(x=1))
f(x=1)

Important: config_kwargs calls cdxcore.config.Config.detach() to obtain a copy of config. This means cdxcore.config.Config.done() must be called explicitly for the returned object even if done() will be called elsewhere for the source config.

Parameters:
configConfig

A Config object or None.

kwargsMapping

If config is provided, the function will call cdxcore.config.Config.update() with kwargs.

config_namestr

A declarative name for the config if config is not proivded.

Returns:
configConfig

A new config object. Please note that if config was provided, then this a copy obtained from calling cdxcore.config.Config.detach(), which means that cdxcore.config.Config.done() must be called explicitly for this object to ensure no parameters were misspelled (it is not sufficient if cdxcore.config.Config.done() is called for config.)

property config_name: str#

Qualified name of this config.

copy()[source]#

Return a fully independent copy of self.

  • The copy has an independent “done” status of self.

  • The copy has an independent usage consistency status.

  • self will remain untouched. In particular, in contrast to cdxcore.config.Config.detach() it will not be set to “done”.

As an example, the following allows using different default values for config members of the same name:

base = Config()
_ = base('a', 1)   # read a with default 1

copy = base.copy() # copy will know 'a' as used with default 1
                   # 'b' was not used yet

_ = base('b', 111) # read 'b' with default 111
_ = copy('b', 222) # read 'b' with default 222 -> ok

_ = copy('a', 2)   # use 'a' with default 2 -> will fail

Use cdxcore.config.Config.clean_copy() for making a copy which discards any prior usage information.

See also the summary on various copy operations in cdxcore.config.

delete_children(names)[source]#

Delete one or several children from self.

This function does not delete recorded consistency information (defaults and help recorded from prior uses of cdxcore.config.Config.__call__()).

detach()[source]#

Returns a copy of self, and sets self to “done”.

The purpose of this function is to defer using a config (often a sub-config) to a later point, while maintaining consistency of usage.

  • The copy has the same “done” status as self at the time of calling detach().

  • The copy shares usage consistency checks with self, i.e. if the same parameter is read with different default or help values an error is raised.

  • The function flags self as “done” using cdxcore.config.Config.mark_done().

For example:

class Example(object):

    def __init__( config ):
        self.a      = config('a', 1, Int>=0, "'a' value")
        self.later  = config.later.detach()  # detach sub-config
        self._cache = None
        config.done()

    def function(self):
        if self._cache is None:
            self._cache = Cache(self.later)  # deferred use of the self.later config. Cache() calls done() on self.later
        return self._cache

See also the summary on various copy operations in cdxcore.config.

Returns:
copyConfig

A copy of self.

done(include_children=True, mark_done=True)[source]#

Closes the config and checks that no unread parameters remain. This is used to detect typos in configuration files.

Raises a cdxcore.config.NotDoneError if there are unused parameters in self.

Consider this example:

config = Config()
config.a = 1
config.child.b = 2

_ = config.a # read a
child = config.child
config.done()     # error because config.child.b has not been read yet

print( child.b )

This example raises an error because config.child.b was not read. If you wish to process the sub-config config.child later, use cdxcore.config.Config.detach():

config = Config()
config.a = 1
config.child.b = 2

_ = config.a # read a
child = config.child.detach()
config.done()   # no error, even though confg.child.b has not been read yet

print( child.b )
child.done()    # need to call done() for the child

By default this function also validates that all child configs were “done”.

See Also

Parameters:
include_children: bool

Validate child configs, too. Stronly recommended default.

mark_done:

Upon completion mark this config as ‘done’. This stops it being modified; that also means subsequent calls to done() will be successful.

Raises:
cdxcore.config.NotDoneError

If not all elements were read.

get(*kargs, **kwargs)[source]#

Returns cdxcore.config.Config.__call__() (*kargs, **kwargs).

get_default(*kargs, **kwargs)[source]#

Returns cdxcore.config.Config.__call__() (*kargs, **kwargs).

get_raw(key, default=<cdxcore.config._ID object>)[source]#

Reads the raw value for key without any casting, nor marking the element as read, nor recording access to the element.

Equivalent to using cdxcore.config.Config.__call__() (key, default, mark_done=False, record=False ) which, without default, is turn itself equivalent to self[key]

get_recorded(key)[source]#

Returns the casted value returned for key previously.

If the parameter key was provided as part of the input data, this value is returned, subject to casting.

If key was not part of the input data, and a default was provided when the parameter was read with cdxcore.config.Config.__call__(), then return this default value, subject to casting.

Raises:
KeyError:

If the key was not previously read successfully.

input_dict(ignore_underscore=True)[source]#

Returns a cdxcore.pretty.PrettyObject of all inputs into this config.

input_report(max_value_len=100)[source]#

Returns a report of all inputs in a readable format. Assumes that str() converts all values into some readable format.

Parameters:
max_value_lenint

Limits the length of str() for each value to max_value_len characters. Set to None to not limit the length.

Returns:
Reportstr
property is_empty: bool#

Whether any parameters have been set, at parent level or at any child level.

keys()[source]#

Returns the keys for the immediate parameters of this config. This call will not return the names of child config; use cdxcore.config.Config.children.

Use cdxcore.config.Config.input_dict() to obtain the full hierarchy of input parameters.

mark_done(include_children=True)[source]#

Mark all members as “done” (having been used).

no_default = <cdxcore.config._ID object>#
property not_done: dict#

Returns a dictionary of keys which were not read yet.

Returns:
not_done: dict

Dictionary of dictionaries: for value parameters, the respective entry is their key and False; for children the key is followed by their not_done dictionary.

record_key(key)[source]#

Returns the fully qualified string key for key.

It has the form config1.config['entry'].

property recorder: SortedDict#

Returns the “recorder”, a sortedcontainers.SortedDict which contains key, default, cast, help, and all other function parameters for all calls of cdxcore.config.Config.__call__(). It is used to ensure consistency of parameter calls.

Use for debugging only.

reset()[source]#

Reset all usage information.

Use cdxcore.config.Config.reset_done() to only reset the information whether a key was used, but to keep consistency information on previously used default and/or help values.

reset_done()[source]#

Reset the internal list of which are “done” (used).

Typically “done” means that a parameter has been read using cdxcore.config.Config.call().

This function does not reset the consistency recording of previous uses of each key. This ensures consistency of default values between uses of keys. Use cdxcore.config.Config.reset() to reset all “done” and reset all usage records.

See also the summary on various copy operations in cdxcore.config.

shallow_copy()[source]#

Return a shallow copy of self which shares all usage tracking with self going forward.

  • The copy shares the “done” status of self.

  • The copy shares all consistency usage status of self.

  • self will not be flagged as ‘done’

static to_config(kwargs, config_name='kwargs')[source]#

Assess whether a parameters is a cdxcore.config.Config, and otherwise tries to convert it into one. Classic use case is to transform ** kwargs to a cdxcore.config.Config to allow type checking and prevent spelling errors.

Returns:
configConfig

If kwargs is already a cdxcore.config.Config it is returned. Otherwise, create a new cdxcore.config.Config from kwargs named using config_name.

unique_hash(*, unique_hash=None, debug_trace=None, input_only=True, **unique_hash_parameters)[source]#

Returns a unique hash key for this object - based on its provided inputs and not based on its usage.

This function allows both provision of an existing unique_hash function or to specify one on the fly using unique_hash_parameters. That means instead of:

from cdxcore.uniquehash import UniqueHash
self.unique_hash( unique_hash=UniqueHash(**p) )

we can directly call:

self.unique_hash( **p )            

The purpose of this function is to allow indexing results of heavy computations which were configured with Config with a simple hash key. A typical application is caching of results based on the relevant user-configuration.

An example for a simplistic cache:

from cdxcore.config import Config
import tempfile as tempfile
import pickle as pickle

def big_function( cache_dir : str, config : Config = None, **kwargs ):
    assert not cache_dir[-1] in ["/","\\"], cache_dir
    config = Config.config_kwargs( config, kwargs )
    uid    = config.unique_hash(length=8)
    cfile  = f"{cache_dir}/{uid}.pck"

    # attempt to read cache
    try:
        with open(cfile, "rb") as f:
            return pickle.load(f)
    except FileNotFoundError:
        pass

    # do something big...
    result = config("a", 0, int, "Value 'a'") * 1000

    # write cache
    with open(cfile, "wb") as f:
        pickle.dump(result,f)

    return result                

cache_dir  = tempfile.mkdtemp()   # for real applications, use a permanent cache_dir.
 _ = big_function( cache_dir = cache_dir, a=1 )
print(_)  

A more sophisticated framework which includes code versioning via cdxcore.version.version() is implemented with cdxcore.subdir.SubDir.cache().

Unique Hash Default Semantics

Please consult the documentation for cdxcore.uniquehash.UniqueHash before using this functionality; in particular note that by default this function ignores config keys or children with leading underscores; set parse_underscore to "protected" or "private" to change this behaviour.

Why is “Usage” not Considered when Computing the Hash (by Default)

When using Config to configure our environment, then we have not only the user’s input values but also the realized values in the form of defaults for those values the user has not provided. In most cases, these are the majority of values.

By only considering actual input values when computing a hash, we stipulate that defaults are not part of the current unique characteristic of the environment.

That seems inconsistent: consider a program which reads a parameter activation with default relu. The hash key will be different for the case where the user does not provide a value for activation, and the case where its value is set to relu by the user. The effective activation value in both cases is relu – why would we not want this to be identified as the same environment configuration.

The following illustrates this dilemma:

def big_function( config ):
    _ = config("activation", "relu", str, "Activation function")
    config.done()

config = Config()
big_function( config )
print( config.unique_hash(length=8) )   # -> 36e9d246

config = Config(activation="relu")
big_function( config )
print( config.unique_hash(length=8) )   # -> d715e29c

Robustness

The key driver of using only input values for hashing is the prevalence of reading (child) configs close to the use of their parameters. That means that often config parameters are only read (and therefore their usage registered) if the respective computation is actually executed: even the big_function example above shows this issue: the call config("a", 0, int, "Value 'a'") will only be executed if the cache could not be found.

This can be rectified if it is ensured that all config parameters are read regardless of actual executed code. In this case, set the parameter input_only for unique_hash() to False. Note that when using cdxcore.config.Config.detach() you must make sure to have processed all detached configurations before calling unique_hash().

Parameters:
unique_hash_parametersdict

If unique_hash is None these parameters are passed to cdxcore.uniquehash.UniqueHash.__call__() to obtain the corrsponding hashing function.

unique_hashCallable

A function to return unique hashes, usally generated using cdxcore.uniquehash.UniqueHash.

debug_tracecdxcore.uniquehash.DebugTrace

Allows tracing of hashing activity for debugging purposes. Two implementations of DebugTrace are currently available:

input_onlybool

Expert use only.

If True (the default) only user-provided inputs are used to compute the unique hash. If False, then the result of cdxcore.config.Config.usage_value_dict() is used to generate the hash. Make sure you read and understand the discussion above on the topic.

Returns:
Unique hash, str

A unique hash of at most the length specified via either unique_hash or unique_hash_parameters.

update(other=None, **kwargs)[source]#

Overwrite values of ‘self’ new values. Accepts the two main formats:

update( dictionary )
update( config )
update( a=1, b=2 )
update( {'x.a':1 } )  # hierarchical assignment self.x.a = 1
Parameters:
otherdict, Config

Copy all content of other into``self``.

If other is a config: elements will be clean_copy()ed; other will not be marked as “read”.

If other is a dictionary, then ‘.’ notation can be used for hierarchical assignments

**kwargs

Allows assigning specific values.

Returns:
selfConfig
usage_report(with_values=True, with_help=True, with_defaults=True, with_cast=False, filter_path=None)[source]#

Generate a human readable report of all variables read from this config.

Parameters:
with_valuesbool, optional

Whether to also print values. This can be hard to read if values are complex objects

with_help: bool, optional

Whether to print help

with_defaults: bool, optional

Whether to print default values

with_cast: bool, optional

Whether to print types

filter_pathstr, optional

If provided, will match the beginning of the fully qualified path of all children vs this string. Most useful with filter_path = self.config_name which ensures only children of this (child) config are shown.

Returns:
Reportstr
usage_reproducer()[source]#

Returns a string representation of current usage, calling repr() for each value.

usage_value_dict()[source]#

Return a flat sorted dictionary of both “used” and, where not used, “input” values.

A “used” value has either been read from user input or was provided as a default. In both cases, it will have been subject to casting.

This function will raise a RuntimeError in either of the following two cases:

  • A key was marked as “done” (read), but no “value” was recorded at that time. A simple example is when cdxcore.config.Config.detach() was called to create a child config, but that config has not yet been read.

  • A key has not been read yet, but there is a record of a value being returned. An example of this happening is if cdxcore.config.Config.reset_done() is called.

used_info(key)[source]#

Returns the usage stats for a given key in the form of a tuple (done, record).

Here done is a boolean and record is a dictionary of consistency information on the key.

cdxcore.config.Float = <cdxcore.config._CastCond object>#

Allows to apply basic range conditions to float parameters.

For example:

timeout = config("timeout", 0.5, Float>=0., "Timeout")

In combination with & we can limit a float to a range:

probability = config("probability", 0.5, (Float>=0.) & (Float <= 1.), "Probability")
exception cdxcore.config.InconsistencyError(key, config_name, message)[source]#

Bases: RuntimeError

Raised when cdxcore.config.Config.__call__() used inconsistently between function calls for a given parameter.

The Config semantics require that parameters are accessed used with consistent default and help values between cdxcore.config.Config.__call__() calls.

For raw access to any paramters, use [].

config_name#

Hierarchical name of the config.

key#

The offending parameter key.

cdxcore.config.Int = <cdxcore.config._CastCond object>#

Allows to apply basic range conditions to int parameters.

For example:

num_steps = config("num_steps", 1, Int>0., "Number of steps")

In combination with & we can limit an int to a range:

bus_days_per_year = config(“bus_days_per_year”, 255, (Int > 0) & (Int < 365), “Business days per year”)

exception cdxcore.config.NotDoneError(not_done, config_name, message)[source]#

Bases: RuntimeError

Raised when cdxcore.config.Config.done() finds that some config parameters have not been read.

The set of those arguments is accessible via cdxcore.config.NotDoneError.not_done.

config_name#

Hierarchical name of the config.

not_done#

The oarameter keys which were not read when cdxcore.config.Config.done() was called.

cdxcore.config.config_kwargs(config, kwargs, config_name='kwargs')#

Default implementation for a usage pattern where the user can provide both a cdxcore.config.Config parameter and ** kwargs.

Example:

def f(config, **kwargs):
    config = Config.config_kwargs( config, kwargs )
    ...
    x = config("x", 1, ...)
    config.done() # <-- important to do this here. Remembert that config_kwargs() calls 'detach'

and then one can use either of the following:

f(Config(x=1))
f(x=1)

Important: config_kwargs calls cdxcore.config.Config.detach() to obtain a copy of config. This means cdxcore.config.Config.done() must be called explicitly for the returned object even if done() will be called elsewhere for the source config.

Parameters:
configConfig

A Config object or None.

kwargsMapping

If config is provided, the function will call cdxcore.config.Config.update() with kwargs.

config_namestr

A declarative name for the config if config is not proivded.

Returns:
configConfig

A new config object. Please note that if config was provided, then this a copy obtained from calling cdxcore.config.Config.detach(), which means that cdxcore.config.Config.done() must be called explicitly for this object to ensure no parameters were misspelled (it is not sufficient if cdxcore.config.Config.done() is called for config.)

cdxcore.config.no_default = <cdxcore.config._ID object>#

Value indicating no default is available for a given parameter.

cdxcore.config.to_config(kwargs, config_name='kwargs')#

Assess whether a parameters is a cdxcore.config.Config, and otherwise tries to convert it into one. Classic use case is to transform ** kwargs to a cdxcore.config.Config to allow type checking and prevent spelling errors.

Returns:
configConfig

If kwargs is already a cdxcore.config.Config it is returned. Otherwise, create a new cdxcore.config.Config from kwargs named using config_name.