flatspin.cmdline
Command-line related utilities
Module Contents
Classes
Information about how to convert command line strings to Python objects. |
|
index or start:stop:step (all ints) |
|
key=<filter> where <filter>=start or start:stop (arbitrary types) |
|
<size> or <sizex>x<sizey> |
|
crop window <crop> or <cropx>,<cropy> where <crop> is before or before:after |
|
window size <sizex>x<sizey> [<stepx>,<stepy>] |
|
Decorate an iterable object, returning an iterator which acts exactly |
|
Helper class for readable parallel mapping. |
Functions
|
index or start:stop:step (all ints) |
|
start or start:stop (arbitrary types) |
|
<size> or <sizex>x<sizey> |
|
crop window <crop> or <cropx>,<cropy> where <crop> is before or before:after |
|
window size <sizex>x<sizey> [<stepx>,<stepy>] |
|
fn(arg1, arg2, ...) |
|
start:stop:step (arbitrary types, with local ctx) OR index |
|
|
|
|
|
|
|
|
|
|
|
|
|
Try to eval(v), on failure return v |
|
|
|
Common argparser for scripts which deal with a dataset |
|
Common argparser for scripts dealing with grid operations on a dataset |
|
Common main function for scripts which deal with a dataset |
Attributes
- class flatspin.cmdline.StoreKeyValue(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
argparse.Action
Information about how to convert command line strings to Python objects.
Action objects are used by an ArgumentParser to represent the information needed to parse a single argument from one or more strings from the command line. The keyword arguments to the Action constructor are also all attributes of Action instances.
- Keyword Arguments
which (- option_strings -- A list of command-line option strings) – should be associated with this action.
object (- dest -- The name of the attribute to hold the created) –
be (- nargs -- The number of command-line arguments that should) –
consumed. By default, one argument will be consumed and a single value will be produced. Other values include:
N (an integer) consumes N arguments (and produces a list)
’?’ consumes zero or one arguments
’*’ consumes zero or more arguments (and produces a list)
’+’ consumes one or more arguments (and produces a list)
Note that the difference between the default and nargs=1 is that with the default, a single value will be produced, while with nargs=1, a list containing a single value will be produced.
the (- metavar -- The name to be used for the option's argument with) – option uses an action that takes no values.
specified. (- default -- The value to be produced if the option is not) –
and (- type -- A callable that accepts a single string argument,) – returns the converted value. The standard Python types str, int, float, and complex are useful examples of such callables. If None, str is used.
None, (- choices -- A container of values that should be allowed. If not) – after a command-line argument has been converted to the appropriate type, an exception will be raised if it is not a member of this collection.
the – command line. This is only meaningful for optional command-line arguments.
argument. (- help -- The help string describing the) –
the – help string. If None, the ‘dest’ value will be used as the name.
- __call__(self, parser, namespace, values, option_string=None)
- parse_value(self, value)
- class flatspin.cmdline.IndexAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
argparse.Action
index or start:stop:step (all ints)
- __call__(self, parser, namespace, values, option_string=None)
- class flatspin.cmdline.FilterAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
StoreKeyValue
key=<filter> where <filter>=start or start:stop (arbitrary types)
- parse_value(self, value)
- class flatspin.cmdline.SizeAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
argparse.Action
<size> or <sizex>x<sizey>
- __call__(self, parser, namespace, values, option_string=None)
- class flatspin.cmdline.CropAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
argparse.Action
crop window <crop> or <cropx>,<cropy> where <crop> is before or before:after
- __call__(self, parser, namespace, values, option_string=None)
- class flatspin.cmdline.WindowAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)
Bases:
argparse.Action
window size <sizex>x<sizey> [<stepx>,<stepy>]
- __call__(self, parser, namespace, values, option_string=None)
- flatspin.cmdline.parse_index(index_str)
index or start:stop:step (all ints)
- flatspin.cmdline.parse_filter(filter_str)
start or start:stop (arbitrary types)
- flatspin.cmdline.parse_size(size)
<size> or <sizex>x<sizey>
- flatspin.cmdline.parse_crop(crop)
crop window <crop> or <cropx>,<cropy> where <crop> is before or before:after
- flatspin.cmdline.parse_window(window)
window size <sizex>x<sizey> [<stepx>,<stepy>]
- flatspin.cmdline.parse_func(func_str)
fn(arg1, arg2, …)
- flatspin.cmdline.parse_time(time_str, ctx={})
start:stop:step (arbitrary types, with local ctx) OR index
- flatspin.cmdline.func_bin(values)
- flatspin.cmdline.func_randint(low=sys.maxsize, *args, **kwargs)
- flatspin.cmdline.func_randseed(n)
- flatspin.cmdline.func_glob(pathname)
- flatspin.cmdline.func_choice(arr=[0, 1], *args, **kwargs)
- flatspin.cmdline.func_one(size)
- flatspin.cmdline.func_read_table(table)
- flatspin.cmdline.param_globals
- flatspin.cmdline.eval_param(v, ctx=None)
Try to eval(v), on failure return v
- flatspin.cmdline.eval_params(params, ctx=None)
- class flatspin.cmdline.ProgressBar(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, delay=0, gui=False, **kwargs)
Bases:
tqdm.auto.tqdm
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.
- class flatspin.cmdline.ParallelProgress(progress_bar, *args, **kwargs)
Bases:
joblib.Parallel
Helper class for readable parallel mapping.
Read more in the User Guide.
- Parameters
n_jobs (int, default: None) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.
backend (str, ParallelBackendBase instance or None, default: 'loky') –
Specify the parallelization backend implementation. Supported backends are:
”loky” used by default, can induce some communication and memory overhead when exchanging input and output data with the worker Python processes.
”multiprocessing” previous process-based backend based on multiprocessing.Pool. Less robust than loky.
”threading” is a very low-overhead backend but it suffers from the Python Global Interpreter Lock if the called function relies a lot on Python objects. “threading” is mostly useful when the execution bottleneck is a compiled extension that explicitly releases the GIL (for instance a Cython loop wrapped in a “with nogil” block or an expensive call to a library such as NumPy).
finally, you can register backends by calling register_parallel_backend. This will allow you to implement a backend of your liking.
It is not recommended to hard-code the backend name in a call to Parallel in a library. Instead it is recommended to set soft hints (prefer) or hard constraints (require) so as to make it possible for library users to change the backend from the outside using the parallel_backend context manager.
prefer (str in {'processes', 'threads'} or None, default: None) – Soft hint to choose the default backend if no specific backend was selected with the parallel_backend context manager. The default process-based backend is ‘loky’ and the default thread-based backend is ‘threading’. Ignored if the
backend
parameter is specified.require ('sharedmem' or None, default None) – Hard constraint to select the backend. If set to ‘sharedmem’, the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_backend.
verbose (int, optional) – The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.
timeout (float, optional) – Timeout limit for each task to complete. If any task takes longer a TimeOutError will be raised. Only applied when n_jobs != 1
pre_dispatch ({'all', integer, or expression, as in '3*n_jobs'}) – The number of batches (of tasks) to be pre-dispatched. Default is ‘2*n_jobs’. When batch_size=”auto” this is reasonable default and the workers should never starve.
batch_size (int or 'auto', default: 'auto') – The number of atomic tasks to dispatch at once to each worker. When individual evaluations are very fast, dispatching calls to workers can be slower than sequential computation because of the overhead. Batching fast computations together can mitigate this. The
'auto'
strategy keeps track of the time it takes for a batch to complete, and dynamically adjusts the batch size to keep the time on the order of half a second, using a heuristic. The initial batch size is 1.batch_size="auto"
withbackend="threading"
will dispatch batches of a single task at a time as the threading backend has very little overhead and using larger batch size has not proved to bring any gain in that case.temp_folder (str, optional) –
Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes. If None, this will try in order:
a folder pointed by the JOBLIB_TEMP_FOLDER environment variable,
/dev/shm if the folder exists and is writable: this is a RAM disk filesystem available by default on modern Linux distributions,
the default system temporary folder that can be overridden with TMP, TMPDIR or TEMP environment variables, typically /tmp under Unix operating systems.
Only active when backend=”loky” or “multiprocessing”.
int (max_nbytes) – Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”.
str – Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”.
None (or) – Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”.
optional – Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”.
default (1M by) – Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”.
mmap_mode ({None, 'r+', 'r', 'w+', 'c'}, default: 'r') – Memmapping mode for numpy arrays passed to workers. None will disable memmapping, other modes defined in the numpy.memmap doc: https://numpy.org/doc/stable/reference/generated/numpy.memmap.html Also, see ‘max_nbytes’ parameter documentation for more details.
Notes
This object uses workers to compute in parallel the application of a function to many different arguments. The main functionality it brings in addition to using the raw multiprocessing or concurrent.futures API are (see examples for details):
More readable code, in particular since it avoids constructing list of arguments.
- Easier debugging:
informative tracebacks even when the error happens on the client side
using ‘n_jobs=1’ enables to turn off parallel computing for debugging without changing the codepath
early capture of pickling errors
An optional progress meter.
Interruption of multiprocesses jobs with ‘Ctrl-C’
Flexible pickling control for the communication to and from the worker processes.
Ability to use shared memory efficiently with worker processes for large numpy-based datastructures.
Examples
A simple example:
>>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
Reshaping the output when the function has several return values:
>>> from math import modf >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10)) >>> res, i = zip(*r) >>> res (0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5) >>> i (0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0)
The progress meter: the higher the value of verbose, the more messages:
>>> from time import sleep >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=2, verbose=10)(delayed(sleep)(.2) for _ in range(10)) [Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.6s [Parallel(n_jobs=2)]: Done 4 tasks | elapsed: 0.8s [Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 1.4s finished
Traceback example, note how the line of the error is indicated as well as the values of the parameter passed to the function that triggered the exception, even though the traceback happens in the child process:
>>> from heapq import nlargest >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3)) #... --------------------------------------------------------------------------- Sub-process traceback: --------------------------------------------------------------------------- TypeError Mon Nov 12 11:37:46 2012 PID: 12934 Python 2.7.3: /usr/bin/python ........................................................................... /usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None) 419 if n >= size: 420 return sorted(iterable, key=key, reverse=True)[:n] 421 422 # When key is none, use simpler decoration 423 if key is None: --> 424 it = izip(iterable, count(0,-1)) # decorate 425 result = _nlargest(n, it) 426 return map(itemgetter(0), result) # undecorate 427 428 # General case, slowest method TypeError: izip argument #1 must support iteration ___________________________________________________________________________
Using pre_dispatch in a producer/consumer situation, where the data is generated on the fly. Note how the producer is first called 3 times before the parallel loop is initiated, and then called to generate new data on the fly:
>>> from math import sqrt >>> from joblib import Parallel, delayed >>> def producer(): ... for i in range(6): ... print('Produced %s' % i) ... yield i >>> out = Parallel(n_jobs=2, verbose=100, pre_dispatch='1.5*n_jobs')( ... delayed(sqrt)(i) for i in producer()) Produced 0 Produced 1 Produced 2 [Parallel(n_jobs=2)]: Done 1 jobs | elapsed: 0.0s Produced 3 [Parallel(n_jobs=2)]: Done 2 jobs | elapsed: 0.0s Produced 4 [Parallel(n_jobs=2)]: Done 3 jobs | elapsed: 0.0s Produced 5 [Parallel(n_jobs=2)]: Done 4 jobs | elapsed: 0.0s [Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
- print_progress(self)
Display the process of the parallel execution only a fraction of time, controlled by self.verbose.
- flatspin.cmdline.main_dataset_argparser(description, output_required=False)
Common argparser for scripts which deal with a dataset
- flatspin.cmdline.main_dataset_grid_argparser(description, output_required=False)
Common argparser for scripts dealing with grid operations on a dataset
- flatspin.cmdline.main_dataset(args)
Common main function for scripts which deal with a dataset