divik.core module

Reusable utilities used for building divik library

divik.core.Centroids

alias of ndarray

divik.core.Data

alias of ndarray

class divik.core.DivikResult(clustering: Union[divik.cluster.GAPSearch, divik.cluster.DunnSearch], feature_selector: divik.feature_selection.StatSelectorMixin, merged: ndarray, subregions: List[Optional[DivikResult]])[source]

Result of DiviK clustering

Attributes
clustering

Alias for field number 0

feature_selector

Alias for field number 1

merged

Alias for field number 2

subregions

Alias for field number 3

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

property clustering

Fitted automated clustering estimator

count(value, /)

Return number of occurrences of value.

property feature_selector

Fitted feature selector

index(value, start=0, stop=sys.maxsize, /)

Return first index of value.

Raises ValueError if the value is not present.

property merged

Recursively merged clustering labels

property subregions

DivikResults for all obtained subregions

divik.core.IntLabels

alias of ndarray

class divik.core.Subsets(n_splits=10, random_state=42)[source]

Scatter dataset to disjoint random subsets and combine them back

Parameters
n_splitsint, default 10

Number of subsets that will be generated.

random_stateint, default 42

Random state to use for seeding the random number generator.

Examples

>>> from divik.core import Subsets
>>> subsets = Subsets(n_splits=10, random_state=42)
>>> X_list = subsets.scatter(X)
>>> len(X_list)
10
>>> # do some computations on each subset
>>> y = subsets.combine(y_list)

Methods

combine

scatter

combine(X_list)[source]
scatter(X)[source]
divik.core.build(klass, **kwargs)[source]

Build instance of klass using matching kwargs

divik.core.cached_fit(cls)[source]

Decorate a sklearn-compatible estimator to cache the fitting result

It is a wrapper over joblib.Memory.cache, that supports runtime cache path definition.

Set path definition through gin config with cache_path.path identifier.

divik.core.configurable(name_or_fn=None, module=None, allowlist=None, denylist=None)[source]

Decorator to make a function or class configurable.

This decorator registers the decorated function/class as configurable, which allows its parameters to be supplied from the global configuration (i.e., set through bind_parameter or parse_config). The decorated function is associated with a name in the global configuration, which by default is simply the name of the function or class, but can be specified explicitly to avoid naming collisions or improve clarity.

If some parameters should not be configurable, they can be specified in denylist. If only a restricted set of parameters should be configurable, they can be specified in allowlist.

The decorator can be used without any parameters as follows:

@config.configurable def some_configurable_function(param1, param2=’a default value’):

In this case, the function is associated with the name ‘some_configurable_function’ in the global configuration, and both param1 and param2 are configurable.

The decorator can be supplied with parameters to specify the configurable name or supply an allowlist/denylist:

@config.configurable(‘explicit_configurable_name’, allowlist=’param2’) def some_configurable_function(param1, param2=’a default value’):

In this case, the configurable is associated with the name ‘explicit_configurable_name’ in the global configuration, and only param2 is configurable.

Classes can be decorated as well, in which case parameters of their constructors are made configurable:

@config.configurable class SomeClass:

def __init__(self, param1, param2=’a default value’):

In this case, the name of the configurable is ‘SomeClass’, and both param1 and param2 are configurable.

Args:
name_or_fn: A name for this configurable, or a function to decorate (in

which case the name will be taken from that function). If not set, defaults to the name of the function/class that is being made configurable. If a name is provided, it may also include module components to be used for disambiguation (these will be appended to any components explicitly specified by module).

module: The module to associate with the configurable, to help handle naming

collisions. By default, the module of the function or class being made configurable will be used (if no module is specified as part of the name).

allowlist: An allowlisted set of kwargs that should be configurable. All

other kwargs will not be configurable. Only one of allowlist or denylist should be specified.

denylist: A denylisted set of kwargs that should not be configurable. All

other kwargs will be configurable. Only one of allowlist or denylist should be specified.

Returns:

When used with no parameters (or with a function/class supplied as the first parameter), it returns the decorated function or class. When used with parameters, it returns a function that can be applied to decorate the target function or class.

divik.core.context_if(condition, context, *args, **kwargs)[source]

Create context with given params only if the condition is True

divik.core.dump_gin_args(destination)[source]

Dump gin-config effective configuration

If you have gin extras installed, you can call dump_gin_args save effective gin configuration to a file.

divik.core.get_n_jobs(n_jobs)[source]

Determine the actual number of possible jobs

divik.core.maybe_pool(processes=None, *args, **kwargs)[source]

Create multiprocessing.Pool if multiple CPUs are allowed

Examples

>>> from divik.core import maybe_pool
>>> with maybe_pool(processes=1) as pool:
...     # Runs in sequential
...     pool.map(id, range(10000))
>>> with maybe_pool(processes=-1) as pool:
...     # Runs with all cores
...     pool.map(id, range(10000))
divik.core.normalize_rows(data)[source]

Translate and scale rows to zero mean and vector length equal one

Return type

ndarray

divik.core.parse_args()[source]

Parse gin config files and parameter overrides from command line

divik.core.seed(seed_=0)[source]

Context manager that creates a seeded scope.

divik.core.seeded(wrapped_requires_seed=False)[source]

Create seeded scope for function call.

Parameters
wrapped_requires_seed: bool, optional, default: False

if true, passes seed parameter to the inner function

divik.core.share(array)[source]

Share a numpy array between multiprocessing.Pool processes

divik.core.visualize(label, xy, shape=None)[source]

Create RGB map of labels over with given coordinates

Modules

divik.core.gin_sklearn_configurables

Mark scikit-learn classes as configurable

divik.core.io

Reusable utilities for data and model I/O