cdxcore.util#

Basic utilities for Python such as type management, formatting, some trivial timers.

Import#

import cdxcore.util as util

Documentation#

Module Attributes

DEF_FILE_NAME_MAP

Default map from characters which cannot be used for filenames under either Windows or Linux to valid characters.

Functions

fmt_big_byte_number(byte_cnt[, str_B])

Return a formatted big byte string, e.g. 12.35MB.

fmt_big_number(number)

Return a formatted big number string, e.g. 12.35M instead of all digits.

fmt_date(dt)

Returns string representation for a date of the form "YYYY-MM-DD".

fmt_datetime(dt, *[, sep, ignore_ms, ignore_tz])

Convert datetime.datetime to a string of the form "YYYY-MM-DD HH:MM:SS".

fmt_dict(dct, *[, sort, none, link])

Return a readable representation of a dictionary.

fmt_digits(integer[, sep])

String representation of an integer with 1000 separators: 10000 becomes "10,000".

fmt_filename(filename[, by])

Replaces invalid filename characters such as `\', ':', or '/' by a differnet character. The returned string is technically a valid file name under both windows and linux.

fmt_list(lst, *[, none, link, sort])

Returns a formatted string of a list, its elements separated by commas and (by default) a final 'and'.

fmt_now()

Returns the cdxcore.util.fmt_datetime() applied to datetime.datetime.now()

fmt_seconds(seconds, *[, eps])

Generate format string for seconds, e.g. "23s"" for seconds=23, or "1:10" for seconds=70.

fmt_time(dt, *[, sep, ignore_ms])

Convers a time to a string with format "HH:MM:SS".

fmt_timedelta(dt, *[, sep])

Returns string representation for a time delta in the form "DD:HH:MM:SS,MS".

getsizeof(obj)

Approximates the size of an object.

is_atomic(o)

Whether an element is atomic.

is_filename(filename[, by])

Tests whether a filename is indeed a valid filename.

is_float(o)

Checks whether a type is a float which includes numpy floating types

is_function(f)

Checks whether f is a function in an extended sense.

is_jupyter()

Whether we operate in a jupter session.

plain(inn, *[, sorted_dicts, native_np, ...])

Converts a python structure into a simple atomic/list/dictionary collection such that it can be read without the specific imports used inside this program.

qualified_name(x[, module])

Return qualified name including module name of some Python element.

types_functions()

Returns a set of all types considered functions

Classes

CRMan()

Carriage Return ("\r") manager.

Timer()

Micro utility to measure passage of time.

TrackTiming()

Simplistic class to track the time it takes to run sequential tasks.

class cdxcore.util.CRMan[source]#

Bases: object

Carriage Return (”\r”) manager.

This class is meant to enable efficient per-line updates using “\r” for text output with a focus on making it work with both Jupyter and the command shell. In particular, Jupyter does not support the ANSI \33[2K ‘clear line’ code. To simulate clearing lines, CRMan keeps track of the length of the current line, and clears it by appending spaces to a message following “\r” accordingly.

This functionality does not quite work accross all terminal types which were tested. Main focus is to make it work for Jupyer for now. Any feedback on how to make this more generically operational is welcome.

crman = CRMan()
print( crman("\rmessage 111111"), end='' )
print( crman("\rmessage 2222"), end='' )
print( crman("\rmessage 33"), end='' )
print( crman("\rmessage 1\n"), end='' )

prints:

message 1     

While

print( crman("\rmessage 111111"), end='' )
print( crman("\rmessage 2222"), end='' )
print( crman("\rmessage 33"), end='' )
print( crman("\rmessage 1"), end='' )
print( crman("... and more.") )

prints

message 1... and more
Attributes:
current

Return current string.

Methods

__call__(message)

Convert message containing "\r" and "\n" into a printable string which ensures that a "\r" string does not lead to printed artifacts.

reset()

Reset object.

write(text[, end, flush, channel])

Write to a channel,

__call__(message)[source]#

Convert message containing “\r” and “\n” into a printable string which ensures that a “\r” string does not lead to printed artifacts. Afterwards, the object will retain any text not terminated by “\n”.

Parameters:
messagestr

message containing “\r” and “\n”.

Returns:
Message: str

Printable string.

property current: str#

Return current string.

This is the string that CRMan is currently visible to the user since the last time a new line was printed.

reset()[source]#

Reset object.

write(text, end='', flush=True, channel=None)[source]#

Write to a channel,

Writes text to channel taking into account any current lines and any “\r” and “\n” contained in text. The end and flush parameters mirror those of print().

Parameters:
textstr

Text to print, containing “\r” and “\n”.

end, flushoptional

end and flush parameters mirror those of print().

channelCallable

Callable to output the residual text. If None, the default, use print() to write to stdout.

cdxcore.util.DEF_FILE_NAME_MAP = {'*': '@', '/': '_', ':': ';', '<': '(', '>': ')', '?': '!', '\\': '_', '|': '_'}#

Default map from characters which cannot be used for filenames under either Windows or Linux to valid characters.

class cdxcore.util.Timer[source]#

Bases: object

Micro utility to measure passage of time.

Example:

from cdxcore.util import Timer
with Timer() as t:
    .... do somthing ...
    print(f"This took {t}.")
Attributes:
fmt_seconds

Seconds elapsed since construction or cdxcore.util.Timer.reset(), formatted using cdxcore.util.fmt_seconds()

hours

Hours passed since construction or cdxcore.util.Timer.reset()

minutes

Minutes passed since construction or cdxcore.util.Timer.reset()

seconds

Seconds elapsed since construction or cdxcore.util.Timer.reset()

Methods

interval_test(interval)

Tests if interval seconds have passed.

reset()

Resets the timer.

property fmt_seconds#

Seconds elapsed since construction or cdxcore.util.Timer.reset(), formatted using cdxcore.util.fmt_seconds()

property hours: float#

Hours passed since construction or cdxcore.util.Timer.reset()

interval_test(interval)[source]#

Tests if interval seconds have passed. If yes, reset timer and return True. Otherwise return False.

Usage:

from cdxcore.util import Timer
tme = Timer()
for i in range(n):
    if tme.test_dt_seconds(2.):
        print(f"\\r{i+1}/{n} done. Time taken so far {tme}.", end='', flush=True)
print("\\rDone. This took {tme}.")
property minutes: float#

Minutes passed since construction or cdxcore.util.Timer.reset()

reset()[source]#

Resets the timer.

property seconds: float#

Seconds elapsed since construction or cdxcore.util.Timer.reset()

class cdxcore.util.TrackTiming[source]#

Bases: object

Simplistic class to track the time it takes to run sequential tasks.

Usage:

from cdxcore.util import TrackTiming
timer = TrackTiming()   # clock starts

# do job 1
timer += "Job 1 done"

# do job 2
timer += "Job 2 done"

print( timer.summary() )
Attributes:
tracked

Returns dictionary of tracked texts

Methods

reset_all()

Reset timer, and clear all tracked items

reset_timer()

Reset the timer to current time

summary([fmat, jn_fmt])

Generate summary string by applying some formatting

track(text, *args, **kwargs)

Track 'text', formatted with 'args' and 'kwargs'

reset_all()[source]#

Reset timer, and clear all tracked items

reset_timer()[source]#

Reset the timer to current time

summary(fmat='%(text)s: %(fmt_seconds)s', jn_fmt=', ')[source]#

Generate summary string by applying some formatting

Parameters:
fmatstr, optional

Format string using %(). Arguments are text, seconds (as int) and fmt_seconds (a string).

Default is "%(text)s: %(fmt_seconds)s".

jn_fmtstr, optional

String to be used between two texts. Default ``”, “ ``.

Returns:
Summarystr

The combined summary string

track(text, *args, **kwargs)[source]#

Track ‘text’, formatted with ‘args’ and ‘kwargs’

property tracked: list#

Returns dictionary of tracked texts

cdxcore.util.fmt_big_byte_number(byte_cnt, str_B=True)[source]#

Return a formatted big byte string, e.g. 12.35MB. Uses 1024 as base for KB.

Use cdxcore.util.fmt_big_number() for converting general numbers using 1000 blocks instead.

Parameters:
byte_cntint

Number of bytes.

str_Bbool

If True, return "GB", "MB" and "KB" units. Moreover, if byte_cnt` is less than 10KB, then this will add ``"bytes" e.g. "1024 bytes".

If False, return "G", "M" and "K" only, and do not add "bytes" to smaller byte_cnt.

Returns:
Textstr

String.

cdxcore.util.fmt_big_number(number)[source]#

Return a formatted big number string, e.g. 12.35M instead of all digits.

Uses decimal system and “B” for billions. Use cdxcore.util.fmt_big_byte_number() for byte sizes i.e. 1024 units.

Parameters:
numberint

Number to format.

Returns:
Textstr

String.

cdxcore.util.fmt_date(dt)[source]#

Returns string representation for a date of the form “YYYY-MM-DD”.

If passed a datetime.datetime, it will format its datetime.datetime.date().

cdxcore.util.fmt_datetime(dt, *, sep=':', ignore_ms=False, ignore_tz=True)[source]#

Convert datetime.datetime to a string of the form “YYYY-MM-DD HH:MM:SS”.

If present, microseconds are added as digits:

YYYY-MM-DD HH:MM:SS,MICROSECONDS

Optinally a time zone is added via:

YYYY-MM-DD HH:MM:SS+HH
YYYY-MM-DD HH:MM:SS+HH:MM

Output is reduced accordingly if dt is a datetime.time or datetime.date.

Parameters:
dtdatetime.datetime, datetime.date, or datetime.time

Input.

sepstr, optional

Seperator for hours, minutes, seconds. The default ':' is most appropriate for viusalization but is not suitable for filenames.

ignore_msbool, optional

Whether to ignore microseconds. Default False.

ignore_tzbool, optional

Whether to ignore the time zone. Default True.

Returns:
Textstr

String.

cdxcore.util.fmt_dict(dct, *, sort=False, none='-', link='and')[source]#

Return a readable representation of a dictionary.

This assumes that the elements of the dictionary itself can be formatted well with str().

For a dictionary dict(a=1,b=2,c=3) this function will return "a: 1, b: 2, and c: 3".

Parameters:
dctdict

The dictionary to format.

sortbool, optional

Whether to sort the keys. Default is False.

nonestr, optional

String to be used if dictionary is empty. Default is "-".

linkstr, optional

String to be used to link the last element to the previous string. Default is "and".

Returns:
Textstr

String.

cdxcore.util.fmt_digits(integer, sep=',')[source]#

String representation of an integer with 1000 separators: 10000 becomes “10,000”.

Parameters:
integerint

The number. The function will int() the input which allows for processing of a number of inputs (such as strings) but might cut off floating point numbers.

sepstr

Separator; "," by default.

Returns:
Textstr

String.

cdxcore.util.fmt_filename(filename, by='default')[source]#

Replaces invalid filename characters such as `\’, ‘:’, or ‘/’ by a differnet character. The returned string is technically a valid file name under both windows and linux.

However, that does not prevent the filename to be a reserved name, for example “.” or “..”.

Parameters:
filenamestr

Input string.

bystr | Mapping, optional.

A dictionary of characters and their replacement. The default value "default" leads to using cdxcore.util.DEF_FILE_NAME_MAP.

Returns:
Textstr

Filename

cdxcore.util.fmt_list(lst, *, none='-', link='and', sort=False)[source]#

Returns a formatted string of a list, its elements separated by commas and (by default) a final ‘and’.

If the list is [1,2,3] then the function will return "1, 2 and 3".

Parameters:
lstlist.

The list() operator is applied to lst, so it will resolve dictionaries and generators.

nonestr, optional

String to be used when list is empty. Default is "-".

linkstr, optional

String to be used to connect the last item. Default is "and".

sortbool, optional

Whether to sort the list. Default is False.

Returns:
Textstr

String.

cdxcore.util.fmt_now()[source]#

Returns the cdxcore.util.fmt_datetime() applied to datetime.datetime.now()

cdxcore.util.fmt_seconds(seconds, *, eps=1e-08)[source]#

Generate format string for seconds, e.g. “23s”” for seconds=23, or “1:10” for seconds=70.

Parameters:
secondsfloat

Seconds as a float.

epsfloat

anything below eps is considered zero. Default 1E-8.

Returns:
Secondsstring
cdxcore.util.fmt_time(dt, *, sep=':', ignore_ms=False)[source]#

Convers a time to a string with format “HH:MM:SS”.

Microseconds are added as digits:

HH:MM:SS,MICROSECONDS

If passed a datetime.datetime, then this function will format only its datetime.datetime.time() part.

Time Zones

Note that while datetime.time objects may carry a tzinfo time zone object, the corresponding datetime.time.otcoffset() function returns None if we donot provide a dt parameter, see tzinfo documentation. That means datetime.time.otcoffset() is only useful if we have datetime.datetime object at hand. That makes sense as a time zone can chnage date as well.

We therefore here do not allow dt to contain a time zone.

Use cdxcore.util.fmt_datetime() for time zone support

Parameters:
dtdatetime.time

Input.

sepstr, optional

Seperator for hours, minutes, seconds. The default ':' is most appropriate for viusalization but is not suitable for filenames.

ignore_msbool

Whether to ignore microseconds. Default is False.

Returns:
Textstr

String.

cdxcore.util.fmt_timedelta(dt, *, sep='')[source]#

Returns string representation for a time delta in the form “DD:HH:MM:SS,MS”.

Parameters:
dtdatetime.timedelta

Timedelta.

sep

Identify the three separators: between days, and HMS and between microseconds:

DD*HH*MM*SS*MS
  0  1  1  2
  • sep can be a string, in which case:
    • If it is an empty string, all separators are ''.

    • A single character will be reused for all separators.

    • If the string has length 2, then the last character is used for '2'.

    • If the string has length 3, then the chracters are used accordingly.

  • sep can also be a collection ie a tuple or list. In this case each element is used accordingly.

Returns:
Textstr

String with leading sign. Returns “” if timedelta is 0.

cdxcore.util.getsizeof(obj)[source]#

Approximates the size of an object.

In addition to calling sys.getsizeof() this function also iterates embedded containers, numpy arrays, and panda dataframes. :meta private:

cdxcore.util.is_atomic(o)[source]#

Whether an element is atomic.

Returns True if o is a string, int, float, datedatime.date, bool, or a numpy.generic

cdxcore.util.is_filename(filename, by='default')[source]#

Tests whether a filename is indeed a valid filename.

Parameters:
filenamestr

Supposed filename.

bystr | Collection, optional

A collection of invalid characters. The default value "default" leads to using they keys of cdxcore.util.DEF_FILE_NAME_MAP.

Returns:
Validityvool

True if filename does not contain any invalid characters contained in by.

cdxcore.util.is_float(o)[source]#

Checks whether a type is a float which includes numpy floating types

cdxcore.util.is_function(f)[source]#

Checks whether f is a function in an extended sense.

Check cdxcore.util.types_functions() for what is tested against. In particular is_function does not test positive for properties.

cdxcore.util.plain(inn, *, sorted_dicts=False, native_np=False, dt_to_str=False)[source]#

Converts a python structure into a simple atomic/list/dictionary collection such that it can be read without the specific imports used inside this program.

For example, objects are converted into dictionaries of their data fields.

Parameters:
inn

some object.

sorted_dictsbool, optional

use SortedDicts instead of dicts. Since Python 3.7 all dictionaries are sorted anyway.

native_npbool, optional

convert numpy to Python natives.

dt_to_strbool, optional

convert dates, times, and datetimes to strings.

Returns:
Textstr

Filename

cdxcore.util.qualified_name(x, module=False)[source]#

Return qualified name including module name of some Python element.

For the most part, this function will try to getattr() the __qualname__ and __name__ of x or its type. If all of these fail, an attempt is made to convert type(x) into a string.

Class Properties

When reporting qualified names for a property(), there is a nuance: at class level, a property will be identified by its underlying function name. Once an object is created, though, the property will be identified by the return type of the property:

class A(object):
    @property
        def p(self):
            return x

qualified_name(A.p)    # -> "A.p"
qualified_name(A().p)  # -> "int"
Parameters:
xany

Some Python element.

modulebool, optional

Whether to also return the containing module if available.

Returns
——-
qualified namestr

The name, if module is False.

(qualified name, module)tuple

The name, if module is True. Note that the module name returned might be "" if no module name could be determined.

Raises:
RuntimeError if not qualfied name for x or its type could be found.
cdxcore.util.types_functions()[source]#

Returns a set of all types considered functions