decodanda package

Submodules

decodanda.classes module

class Decodanda(data: list | dict, conditions: dict, classifier: any = 'svc', neural_attr: str = 'raster', trial_attr: str = 'trial', squeeze_trials: bool = False, min_data_per_condition: int = 2, min_trials_per_condition: int = 2, min_activations_per_cell: int = 1, min_time_separation: float | None = None, time_attr: str | None = None, trial_chunk: int | None = None, exclude_silent: bool = False, verbose: bool = False, zscore: bool = False, fault_tolerance: bool = False, debug: bool = False, **kwargs)

Bases: object

Main class that implements the decoding pipelines with built-in best practices.

It works by separating the input data into all possible conditions - defined as specific combinations of variable values - and sampling data points from these conditions according to the specific decoding problem.

Parameters:
  • data – A dictionary or a list of dictionaries each containing (1) the neural data (2) a set of variables that we want to decode from the neural data (3) a trial number. See the Data Structure section for more details. If a list is passed, the analyses will be performed on the pseudo-population built by pooling all the data sets in the list.

  • conditions – A dictionary that specifies which values for which variables of data we want to decode. See the Data Structure section for more details.

  • classifier – The classifier used for all decoding analyses. Default: sklearn.svm.LinearSVC.

  • neural_attr – The key under which the neural features are stored in the data dictionary.

  • trial_attr – The key under which the trial numbers are stored in the data dictionary. Each different trial is considered as an independent sample to be used in during cross validation. If None, trials are defined as consecutive bouts of data in time where all the variables have a constant value.

  • squeeze_trials – If True, all population vectors corresponding to the same trial number for the same condition will be squeezed into a single average activity vector.

  • min_data_per_condition – The minimum number of data points per each condition, defined as a specific combination of values of all variables in the conditions dictionary, that a data set needs to have to be included in the analysis.

  • min_trials_per_condition – The minimum number of unique trial numbers per each condition, defined as a specific combination of values of all variables in the conditions dictionary, that a data set needs to have to be included in the analysis.

  • min_activations_per_cell – The minimum number of non-zero bins that single neurons / features need to have to be included into the analysis.

  • min_time_separation – The minimum time difference, computed using time_attr, between data assigned to different trial numbers. This prevents signals with long autocorrelations (e.g., calcium activity) from spilling over between training and testing trials.

  • time_attr – Name of the session field/attribute containing the time vector (one value per sample/bin), used to compute min_time_separation.

  • trial_chunk – Only used when trial_attr=None. The maximum number of consecutive data points within the same bout. Bouts longer than trial_chunk data points are split into different trials.

  • exclude_silent – If True, all silent population vectors (only zeros) are excluded from the analysis.

  • verbose – If True, most operations and analysis results are logged in standard output.

  • zscore – If True, neural features are z-scored before being separated into conditions.

  • fault_tolerance – If True, the constructor raises a warning instead of an error if no data set passes the inclusion criteria specified by min_data_per_condition and min_trials_per_condition.

  • debug – If True, operations are super verbose. Do not use unless you are developing.

Data structure

Decodanda works with datasets organized into Python dictionaries. For N recorded neurons and T trials (or time bins), the data dictionary must contain:

  1. a TxN array, under the raster key

    This is the set of features we use to decode. Can be continuous (e.g., calcium fluorescence) or discrete (e.g., spikes) values.

  2. a Tx1 array specifying a trial number

    This array will define the subdivisions for cross validation: trials (or time bins) that share the same `trial` value will always go together in either training or testing samples.

  3. a Tx1 array for each variable we want to decode

    Each value will be used as a label for the raster feature. Make sure these arrays are synchronized with the raster array.

Say we have a data set with N=50 neurons, T=800 time bins divided into 80 trials, where two experimental variables are specified stimulus and action. A properly-formatted data set would look like this:

>>> data = {
>>>     'raster': [[0, 1, ..., 0], ..., [0, 2, ..., 1]],     # <800x50 array>, neural activations
>>>     'stimulus': ['A', 'A', 'B', ..., 'B'],               # <800x1 array>, values of the stimulus variable
>>>     'action': ['left', 'left', 'none', ..., 'left'],    # <800x1 array>, values of the action variable
>>>     'trial':  [1, 1, 1, ..., 2, 2, 2, ..., 80, 80, 80],  # <800x1 array>, trial number, 80 unique numbers
>>> }

The conditions dictionary is used to specify which variables - out of all the keywords in the data dictionary, and which and values - out of all possible values of each specified variable - we want to decode.

It has to be in the form {key: [value1, value2]}:

>>> conditions = {
>>>     'stimulus': ['A', 'B'],
>>>     'action': ['left', 'right']
>>> }

If more than one variable is specified, Decodanda will balance all conditions during each decoding analysis to disentangle the variables and avoid confounding correlations.

Examples

Using the data set defined above:

>>> from decodanda import Decodanda
>>>
>>> dec = Decodanda(
>>>         data=data,
>>>         conditions=conditions
>>>         verbose=True)
>>>
[Decodanda]     building conditioned rasters for session 0
            (stimulus = A, action = left):      Selected 150 time bin out of 800, divided into 15 trials
            (stimulus = A, action = right):     Selected 210 time bin out of 800, divided into 21 trials
            (stimulus = B, action = left):      Selected 210 time bin out of 800, divided into 21 trials
            (stimulus = B, action = right):     Selected 230 time bin out of 800, divided into 23 trials

The constructor divides the data into conditions using the stimulus and action values and stores them in the self.conditioned_rasters object. This condition structure is the basis for all the balanced decoding analyses.

CCGP(resamplings=5, nshuffles: int = 25, ndata: int | None = None, max_semantic_dist: int = 1, plot: bool = False, ax: Axes | None = None, **kwargs)

Main function that performs the cross-condition generalization performance analysis (CCGP, Bernardi et al. 2020, Cell) for the variables specified through the conditions dictionary.

It returns a single ccgp value per variable which represents the average over all cross-condition train-test splits. This function uses split_rule=’OneOut’ as a default.

It also returns an array of null-model values for each variable to test the significance of the corresponding ccgp result. The employed geometrical null model keeps variables decodable but randomly displaces conditions in the neural activity space, hence breaking any coding parallelism and generizability. See Bernardi et al 2020 & Boyle, Posani et al. 2023 for more details.

Parameters:
  • resamplings – The number of iterations for each decoding analysis. The returned performance value is the average over these resamplings.

  • nshuffles – The number of null-model iterations for the CCGP analysis.

  • ndata – The number of data points (population vectors) sampled for training and for testing for each condition.

  • max_semantic_dist – The maximum semantic distance (number of variables that change value) between conditions in the held-out pair used to test the classifier.

  • plot – if True, a visualization of the decoding results is shown.

  • ax – if specified and plot=True, the results will be displayed in the specified axis instead of a new figure.

Returns:

  • performance (mean of performance values for each cross-condition training-testing split.)

  • null (a list of null values for the generalization performance)

Note

For each variable, this function trains the self._classifier to decode the given variable in a sub-set of conditions, and tests it on the held-out set.

The split of training and testing conditions is performed by keeping the semantic distance between held out conditions to 1 (max_semantic_dist=1 in the CCGP_dichotomy function).

For example, if the data set has two variables:

stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, to compute CCGP for stimulus

This function will train the classifier on

(stimulus = -1, action = -1) vs. (stimulus = 1, action = -1)

And test it on

(stimulus = -1, action = 1) vs. (stimulus = 1, action = 1)

And vice-versa. Note that action is kept fixed within the training and testing conditions.

Example

>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perfs, null = dec.CCGP(nshuffles=10)
>>> perfs
{'stimulus': 0.81, 'action': 0.79}  # each value is the mean over 2 cross-condition train-test splits
>>> null
{'stimulus': [0.51, ..., 0.46], 'action': [0.48, ..., 0.55]}  # null model means, 10 values each
CCGP_dichotomy(dichotomy: str | list, resamplings: int = 3, ndata: int | None = None, max_semantic_dist: int = 1, split_rule='OneOut', shuffled: bool = False, **kwargs)

Function that performs the cross-condition generalization performance analysis (CCGP, Bernardi et al. 2020, Cell) for a given variable, specified through its corresponding dichotomy. This function tests how well a given coding strategy for the given variable generalizes when the other variables are changed.

Parameters:
  • dichotomy (str || list) – The dichotomy corresponding to the variable to be tested, expressed in a double-list binary format, e.g. [[‘10’, ‘11’], [‘01’, ‘00’]], or as a variable name.

  • resamplings – The number of iterations for each decoding analysis. The returned performance value is the average over these resamplings.

  • ndata – The number of data points (population vectors) sampled for training and for testing for each condition.

  • max_semantic_dist – The maximum semantic distance (number of variables that change value) between conditions in the held-out pair used to test the classifier.

  • split_rule – The way conditions are split in training and testing. OneOut (default), name of a variable, or dichotomy in the double-list binary format. If OneOut is used, one pair of conditions is held out and the rest is used to train the classifier; if a variable is specified, then CCGP is computed specifically across that variable, balancing any third (or further) variables during sampling.

  • shuffled – If True, the data is sampled according to geometrical null model for CCGP that keeps variables decodable but breaks the generalization. See Bernardi et al 2020 & Boyle, Posani et al. 2023.

Returns:

performances

Return type:

list of performance values for each cross-condition training-testing split.

Note

This function trains the self._classifier to decode the given variable in a sub-set of conditions, and tests it on the held-out set.

The split of training and testing conditions is decided by the max_semantic_dist parameter: if set to 1, only pairs of conditions that have all variables in common except the specified one are held out to test the classifier.

For example, if the data set has two variables

stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, to compute CCGP for stimulus with max_semantic_dist=1 this function will train the classifier on

(stimulus = -1, action = -1) vs. (stimulus = 1, action = -1)

And test it on

(stimulus = -1, action = 1) vs. (stimulus = 1, action = 1)

note that action is kept fixed within the training and testing conditions.

If instead we use max_semantic_dist=2, all possible combinations are used, including training on

(stimulus = -1, action = -1) vs. (stimulus = 1, action = 1)

and testing on

(stimulus = -1, action = 1) vs. (stimulus = 1, action = -1)

dichotomy can be passed as a string or as a list.

If a string is passed, it has to be a name of one of the variables specified in the conditions dictionary.

If a list is passed, it needs to contain two lists in the shape [[…], […]]. Each sub list contains the conditions used to define one of the two decoded classes in binary notation.

For example, if the data set has two variables stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, the condition stimulus=-1 & action=-1 will correspond to the binary notation '00', the condition stimulus=+1 & action=-1 will correspond to 10 and so on.

Therefore, if stimulus is the first variable in the conditions dictionary, its corresponding dichotomy is

>>> stimulus = [['00', '01'], ['10', '11']]

Example

>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perfs = dec.CCGP_dichotomy('stimulus')
>>> perfs
[0.82, 0.87] # 2 values
CCGP_with_nullmodel(dichotomy: str | list, resamplings: int = 5, nshuffles: int = 25, ndata: int | None = None, max_semantic_dist: int = 1, split_rule='OneOut', return_combinations: bool = False, **kwargs)

Function that performs the cross-condition generalization performance analysis (CCGP, Bernardi et al. 2020, Cell) for a given variable, specified through its corresponding dichotomy.

This function tests how well a given coding strategy for the given variable generalizes

when the other variables are changed and compares the resulting values with a geometrical null model that keeps variables decodable but randomly displaces conditions in the neural activity space, hence breaking any coding parallelism and generizability. See Bernardi et al 2020 & Boyle, Posani et al. 2023 for more details.

Parameters:
  • dichotomy (str || list) – The dichotomy corresponding to the variable to be tested, expressed in a double-list binary format, e.g. [[‘10’, ‘11’], [‘01’, ‘00’]], or as a variable name.

  • resamplings – The number of iterations for each decoding analysis. The returned performance value is the average over these resamplings.

  • nshuffles – The number of null-model iterations for the CCGP analysis.

  • ndata – The number of data points (population vectors) sampled for training and for testing for each condition.

  • max_semantic_dist – The maximum semantic distance (number of variables that change value) between conditions in the held-out pair used to test the classifier.

  • split_rule – The way conditions are split in training and testing. OneOut (default), name of a variable, or dichotomy in the double-list binary format. If OneOut is used, one pair of conditions is held out and the rest is used to train the classifier; if a variable is specified, then CCGP is computed specifically across that variable, balancing any third (or further) variables during sampling.

  • return_combinations – If True, returns all the individual performances for cross-conditions train-test splits, otherwise returns the average over combinations.

Returns:

  • ccgp (mean of performance values for each cross-condition training-testing split (or list, if return_combinations=True).)

  • null (a list of null values for the mean ccgp)

Note

This function trains the self._classifier to decode the given variable in a sub-set of conditions, and tests it on the held-out set.

The split of training and testing conditions is decided by the max_semantic_dist parameter: if set to 1, only pairs of conditions that have all variables in common except the specified one are held out to test the classifier.

For example, if the data set has two variables

stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, to compute CCGP for stimulus with max_semantic_dist=1 this function will train the classifier on

(stimulus = -1, action = -1) vs. (stimulus = 1, action = -1)

And test it on

(stimulus = -1, action = 1) vs. (stimulus = 1, action = 1)

note that action is kept fixed within the training and testing conditions.

If instead we use max_semantic_dist=2, all possible combinations are used, including training on

(stimulus = -1, action = -1) vs. (stimulus = 1, action = 1)

and testing on

(stimulus = -1, action = 1) vs. (stimulus = 1, action = -1)

dichotomy can be passed as a string or as a list.

If a string is passed, it has to be a name of one of the variables specified in the conditions dictionary.

If a list is passed, it needs to contain two lists in the shape [[…], […]]. Each sub list contains the conditions used to define one of the two decoded classes in binary notation.

For example, if the data set has two variables stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, the condition stimulus=-1 & action=-1 will correspond to the binary notation '00', the condition stimulus=+1 & action=-1 will correspond to 10 and so on.

Therefore, if stimulus is the first variable in the conditions dictionary, its corresponding dichotomy is

>>> stimulus = [['00', '01'], ['10', '11']]

Example

>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perf, null = dec.CCGP_with_nullmodel('stimulus', nshuffles=10)
>>> perf
0.85
>>> null
[0.44, 0.48, ..., 0.54] # 10 values
CVI(training_fraction: float = 0.75, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, return_splits: bool = False, signed=False)
PS(nshuffles: int = 25, max_semantic_dist: int = 1, method: str = 'pearson', plot: bool = False, ax: Axes | None = None, **kwargs)
PS_with_nullmodel(dichotomy: str | list, nshuffles: int = 25, max_semantic_dist: int = 1, method: str = 'pearson', return_combinations: bool = False, **kwargs)
all_dichotomies(balanced=True, semantic_names=False)
balanced_resample(condition_names=False, ndata=None, z_score=None, min_ar=0)
Parameters:
  • condition_names (if True, verbose names for conditions are used, otherwise a binary notation is used. Default: False.)

  • ndata (optional, number of resampled activity vectors per condition. If not specified,)

  • the maximum number of activity vectors across all conditions is used.

  • z_score (if True, the resampled rasters are z-scored with respect to all the conditions.)

  • min_ar (neurons below a minimum activity rate (fraction of bins with non-zero activity) threshold specified)

  • by the ``min_ar`` parameter will be excluded from the sampled data.

Return type:

balanced resampled rasters

decode(training_fraction: float, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, subsample: int | None = 0, parallel: bool = False, non_semantic: bool = False, return_CV: bool = False, testing_trials: list | None = None, plot: bool = False, ax: Axes | None = None, plot_all: bool = False, **kwargs)

Main function to decode the variables specified in the conditions dictionary.

It returns a single decoding value per variable which represents the average over the cross-validation folds.

It also returns an array of null-model values for each variable to test the significance of the corresponding decoding result.

Notes

Each decoding analysis is performed by first re-sampling an equal number of data points from each condition (combination of variable values), so to ensure that possible confounds due to correlated conditions are balanced out.

Before sampling, each condition is individually divided into training and testing bins by using the self.trial array specified in the data structure when constructing the Decodanda object.

To generate the null model values, the relationship between the neural data and the decoded variable is randomly shuffled. Eeach null model value corresponds to the average across cross_validations` iterations after a single data shuffle.

If non_semantic=True, dichotomies that do not correspond to variables will also be decoded. Note that, in the case of 2 variables, there is only one non-semantic dichotomy (corresponding to grouping together conditions that have the same XOR value in the binary notation: [['10', '01'], ['11', '00']]). However, the number of non-semantic dichotomies grows exponentially with the number of conditions, so use with caution if more than two variables are specified in the conditions dictionary.

Parameters:
  • training_fraction – the fraction of trials used for training in each cross-validation fold.

  • cross_validations – the number of cross-validations.

  • nshuffles – the number of null-model iterations of the decoding procedure.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • subsample – if >0, a random subsample of neurons of size=subsample will be used at each cross-validation

  • parallel – if True, each cross-validation is performed by a dedicated thread (experimental, use with caution).

  • return_CV – if True, invidual cross-validation values are returned in a list. Otherwise, the average performance over the cross-validation folds is returned.

  • testing_trials – if specified, data sampled from the specified trial numbers will be used for testing, and the remaining ones for training.

  • non_semantic – if True, non-semantic dichotomies (i.e., dichotomies that do not correspond to a variable) will also be decoded.

  • plot – if True, a visualization of the decoding results is shown.

  • ax – if specified and plot=True, the results will be displayed in the specified axis instead of a new figure.

  • plot_all – if True, a more in-depth visualization of the decoding results and of the decoded data is shown.

Returns:

  • perfs – a dictionary containing the decoding performances for all variables in the form of {var_name_1: performance1, var_name_2: performance2, ...}

  • null – a dictionary containing an array of null model decoding performance for each variable in the form {var_name_1: [...], var_name_2: [...], ...}.

See also

Decodanda.decode_with_nullmodel

The method used for each decoding analysis.

Example

>>> from decodanda import Decodanda, generate_synthetic_data
>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perfs, null = dec.decode(training_fraction=0.75, cross_validations=10, nshuffles=20)
>>> perfs
{'stimulus': 0.88, 'action': 0.85}  # mean over 10 cross-validation folds
>>> null
{'stimulus': [0.51, ..., 0.46], 'action': [0.48, ..., 0.55]}  # null model means, 20 values each
decode_dichotomy(dichotomy: str | list, training_fraction: float, cross_validations: int = 10, ndata: int | None = None, shuffled: bool = False, parallel: bool = False, testing_trials: list | None = None, dic_key: str | None = None, subsample: float | None = 0, **kwargs) ndarray

Function that performs cross-validated decoding of a specific dichotomy. Decoding is performed by sampling a balanced amount of data points from each condition in each class of the dichotomy, so to ensure that only the desired variable is analyzed by balancing confounds. Before sampling, each condition is individually divided into training and testing bins by using the self.trial array specified in the data structure when constructing the Decodanda object.

Parameters:
  • dichotomy (str || list) – The dichotomy to be decoded, expressed in a double-list binary format, e.g. [[‘10’, ‘11’], [‘01’, ‘00’]], or as a variable name.

  • training_fraction – the fraction of trials used for training in each cross-validation fold.

  • cross_validations – the number of cross-validations.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • shuffled – if True, population vectors for each condition are sampled in a random way compatibly with a null model for decoding performance.

  • parallel – if True, each cross-validation is performed by a dedicated thread (experimental, use with caution).

  • testing_trials – if specified, data sampled from the specified trial numbers will be used for testing, and the remaining ones for training.

  • dic_key – if specified, weights of the decoding analysis will be saved in self.decoding_weights using dic_key as the dictionary key.

  • subsample – if >0, a random subsample of neurons of size=subsample will be used at each cross-validation

Returns:

performances

Return type:

list of decoding performance values for each cross-validation.

Note

dichotomy can be passed as a string or as a list. If a string is passed, it has to be a name of one of the variables specified in the conditions dictionary.

If a list is passed, it needs to contain two lists in the shape [[…], […]]. Each sub list contains the conditions used to define one of the two decoded classes in binary notation.

For example, if the data set has two variables stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, the condition stimulus=-1 & action=-1 will correspond to the binary notation '00', the condition stimulus=+1 & action=-1 will correspond to 10 and so on. Therefore, the notation:

>>> dic = 'stimulus'

is equivalent to

>>> dic = [['00', '01'], ['10', '11']]

and

>>> dic = 'action'

is equivalent to

>>> dic = [['00', '10'], ['01', '11']]

However, not all dichotomies have names (are semantic). For example, the dichotomy

>>> [['01','10'], ['00', '11']]

can only be defined using the binary notation.

Note that this function gives you the flexibility to use sub-sets of conditions, for example

>>> dic = [['10'], ['01']]

will decode stimulus=1 & action=-1 vs. stimulus=-1 & action=1

Example

>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perfs = dec.decode_dichotomy('stimulus', training_fraction=0.75, cross_validations=10)
>>> perfs
[0.82, 0.87, 0.75, ..., 0.77] # 10 values
decode_multiclass(classes, training_fraction: float, cross_validations: int = 10, ndata: int | None = None, subsample: int | None = 0, shuffled: bool | None = False)

Multiclass decoding of a single variable.

Parameters:
  • classes (str or list) – If str, interpreted as the name of a variable in self.conditions, and the class structure is obtained via self._balanced_classes(classes). If list, it should be a list of lists of condition keys, e.g. [[‘00’, ‘01’], [‘10’, ‘11’], [‘20’, ‘21’]].

  • training_fraction (float) – Fraction of trials used for training in each cross-validation fold.

  • cross_validations (int) – Number of cross-validation iterations.

  • ndata (int, optional) – Number of data points sampled per condition for training and testing. If None, defaults are chosen as in decode_dichotomy.

  • subsample (int, optional) – If >0, a random subset of neurons of size=subsample is used.

  • shuffled (bool, optional) – If True, use the geometric null model implemented by self._shuffle_conditioned_arrays before decoding, and restore the original ordering afterwards.

Returns:

performance – Array of decoding performance values for each cross-validation.

Return type:

np.ndarray

decode_multiclass_with_nullmodel(variable: str | list, training_fraction: float, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, return_CV: bool = False, plot: bool = False, dic_key: str | None = None, subsample: int | None = 0, **kwargs)
decode_with_nullmodel(dichotomy: str | list, training_fraction: float, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, parallel: bool = False, return_CV: bool = False, testing_trials: list | None = None, plot: bool = False, dic_key: str | None = None, subsample: int | None = 0, **kwargs) Tuple[list | ndarray, ndarray]

Function that performs cross-validated decoding of a specific dichotomy and compares the resulting values with a null model where the relationship between the neural data and the two sides of the dichotomy is shuffled.

Decoding is performed by sampling a balanced amount of data points from each condition in each class of the dichotomy, so to ensure that only the desired variable is analyzed by balancing confounds.

Before sampling, each condition is individually divided into training and testing bins by using the self.trial array specified in the data structure when constructing the Decodanda object.

Parameters:
  • dichotomy (str || list) – The dichotomy to be decoded, expressed in a double-list binary format, e.g. [[‘10’, ‘11’], [‘01’, ‘00’]], or as a variable name.

  • training_fraction – the fraction of trials used for training in each cross-validation fold.

  • cross_validations – the number of cross-validations.

  • nshuffles – the number of null-model iterations of the decoding procedure.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • parallel – if True, each cross-validation is performed by a dedicated thread (experimental, use with caution).

  • return_CV – if True, invidual cross-validation values are returned in a list. Otherwise, the average performance over the cross-validation folds is returned.

  • testing_trials – if specified, data sampled from the specified trial numbers will be used for testing, and the remaining ones for training.

  • plot – if True, a visualization of the decoding results is shown.

  • dic_key – if specified, weights of the decoding analysis will be saved in self.decoding_weights using dic_key as the dictionary key.

  • subsample – if >0, a random subsample of neurons of size=subsample will be used at each cross-validation

Returns:

performances, null_performances

Return type:

list of decoding performance values for each cross-validation.

See also

Decodanda.decode_dichotomy

The method used for each decoding iteration.

Note

dichotomy can be passed as a string or as a list. If a string is passed, it has to be a name of one of the variables specified in the conditions dictionary.

If a list is passed, it needs to contain two lists in the shape [[…], […]]. Each sub list contains the conditions used to define one of the two decoded classes in binary notation.

For example, if the data set has two variables stimulus \(\in\) {-1, 1} and action \(\in\) {-1, 1}, the condition stimulus=-1 & action=-1 will correspond to the binary notation '00', the condition stimulus=+1 & action=-1 will correspond to 10 and so on. Therefore, the notation:

>>> dic = 'stimulus'

is equivalent to

>>> dic = [['00', '01'], ['10', '11']]

and

>>> dic = 'action'

is equivalent to

>>> dic = [['00', '10'], ['01', '11']]

However, not all dichotomies have names (are semantic). For example, the dichotomy

>>> [['01','10'], ['00', '11']]

can only be defined using the binary notation.

Note that this function gives you the flexibility to use sub-sets of conditions, for example

>>> dic = [['10'], ['01']]

will decode stimulus=1 & action=-1 vs. stimulus=-1 & action=1

Example

>>> data = generate_synthetic_data(keyA='stimulus', keyB='action')
>>> dec = Decodanda(data=data, conditions={'stimulus': [-1, 1], 'action': [-1, 1]})
>>> perf, null = dec.decode_with_nullmodel('stimulus', training_fraction=0.75, cross_validations=10, nshuffles=20)
>>> perf
0.88
>>> null
[0.51, 0.54, 0.48, ..., 0.46] # 25 values
parallelism_score_dichotomy(dichotomy: str | list, max_semantic_dist: int = 1, shuffled: bool = False, method: str = 'pearson', return_combinations: bool = False)
semantic_score_geometry(training_fraction: float = 0.75, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, visualize=True)

This function performs a balanced decoding analysis for each possible dichotomy, and plots the result sorted by a semantic score that tells how close each dichotomy is to any of the specified variables. A semantic dichotomy has semantic_score = 1, the XOR dichotomy has semantic_score = 0.

Parameters:
  • training_fraction – the fraction of trials used for training in each cross-validation fold.

  • cross_validations – the number of cross-validations.

  • nshuffles – the number of null-model iterations of the decoding procedure.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • visualize – if True, the decoding results are shown in a figure.

Returns:

  • dichotomies_data – Two lists, one containing all the dichotomies in binary notation and one containing the corresponding semantic score.

  • decoding_data – Two dictionaries, one containing the decoding performances for all dichotomies and one containing all the corresponding lists of null model performances.

  • CCGP_data – Two dictionaries, one containing the CCGP values for all dichotomies and one containing all the corresponding lists of null model values.

shattering_dimensionality(training_fraction: float = 0.75, cross_validations: int = 10, nshuffles: int = 10, ndata: int | None = None, subsample: int | None = 0, p_threshold: float = 0.01, visualize: bool = True, semantic_names: dict | None = None, **kwargs)

This function computes shattering dimensionality as defined in Bernardi et al. 2020, i.e., as the number of balanced dichotomies that a linear decoder can classify above chance levels.

Parameters:
  • training_fraction – the fraction of trials used for training in each cross-validation fold.

  • cross_validations – the number of cross-validations.

  • nshuffles – the number of null-model iterations of the decoding procedure.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • subsample – if >0, a random subsample of neurons of size=subsample will be used at each cross-validation.

  • p_threshold – p-value threshold (z-score from the null model) to consider a performance as statistically significant.

  • visualize – if True, the decoding results are shown in a figure.

Returns:

  • shattering_dim – shattering dimensionality

  • perfs – dictionary of decoding performance per dichotomy

  • null – dictionary of lists of null model values per dichotomy

shattering_generalization(nshuffles: int = 10, ndata: int | None = None, p_threshold: float = 0.01, visualize: bool = True, semantic_names: dict | None = None, max_semantic_dist=99, **kwargs)

This function computes shattering generalization defined as the number of balanced dichotomies that have a above-chance CCGP.

Parameters:
  • nshuffles – the number of null-model iterations of the decoding procedure.

  • ndata – the number of data points (population vectors) sampled for training and for testing for each condition.

  • p_threshold – p-value threshold (z-score from the null model) to consider a performance as statistically significant.

  • visualize – if True, the decoding results are shown in a figure.

Returns:

  • shattering_gen – shattering dimensionality

  • perfs – dictionary of decoding performance per dichotomy

  • null – dictionary of lists of null model values per dichotomy

split_resample(fraction=0.5, condition_names=False, ndata=None, z_score=None, min_ar=0)

Parameters ———-

fraction: the fraction of trials used to sample from to fill the first data set ( raster_A). The remaining fraction (1-fraction) is used to sample the second data set (raster_B)

condition_names: if True, verbose names for conditions are used, otherwise a binary notation is used. Default: False.

ndata: optional, number of resampled activity vectors per condition. If not specified, the maximum number of activity vectors across all conditions is used.

z_score: if True, the resampled rasters are z-scored with respect to all the conditions.

min_ar: neurons below a minimum activity rate (fraction of bins with non-zero activity) threshold specified by the min_ar parameter will be excluded from the sampled data.

Return type:

rasters_A, rasters_B - dictionaries with resampled data for all conditions from different trials

visualize_PCA(**kwargs)
balance_decodandas(ds)
check_requirements_two_conditions(sessions, conditions_1, conditions_2, **decodanda_params)
check_session_requirements(session, conditions, **decodanda_params)
decoding_analysis(data, conditions, decodanda_params, analysis_params, parallel=False, plot=False, ax=None)

Function that performs a balanced decoding analyses of the data set passed in the data argument, using variables and values specified in the conditions dictionary.

This functions is a shortcut for building a Decodanda object with decodanda_params as arguments and calling the Decodanda.decode function with analysis_params as arguments.

Notes

This function is equivalent to

>>> Decodanda(data, conditions, **decodanda_params).decode(**analysis_params)
Parameters:
  • data – The data set used by the Decodanda object.

  • conditions – The conditions dictionary for the Decodanda object.

  • decodanda_params – A dictionary specifying the values for the Decodanda constructor parameters.

  • analysis_params – A dictionary specifying the values for the Decodanda.decode function parameters.

  • parallel – [Experimental] if True, null model iterations are performed on separated threads.

  • plot – If True, the decoding results are shown in a figure.

  • ax – If specified, and plot=True the results are shown in the specified axis.

Return type:

performances, null

decodanda.imports module

decodanda.in_time module

CCGP_at_time(data, conditions, time_attr, time, dt, decodanda_params, decoding_params)
CCGP_in_time(data, conditions, time_attr, time_window, decodanda_params, decoding_params, time_boundaries, plot=False, time_key='Time')
Parameters:
  • data – the dataset to be decoded, in the same for as in the Decodanda constructor.

  • conditions – the variables with values to be decoded, in the same for as in the Decodanda constructor.

  • time_attr – the variable that defines time from the zero offset.

  • decodanda_params – dictionary of parameters for the Decodanda constructor.

  • decoding_params – dictionary of parameters for the Decodanda.decode() function.

  • time_boundaries – List [min, max]: only trials with data points spanning the whole time interval will be considered for the decoding analysis.

Returns:

performances, null

decode_in_time(data, conditions, time_attr, time_window, decodanda_params, decoding_params, time_boundaries, plot=False, time_key='Time', verbose=False)
Parameters:
  • data – the dataset to be decoded, in the same for as in the Decodanda constructor.

  • conditions – the variables with values to be decoded, in the same for as in the Decodanda constructor.

  • time_attr – the variable that defines time from the zero offset.

  • decodanda_params – dictionary of parameters for the Decodanda constructor.

  • decoding_params – dictionary of parameters for the Decodanda.decode() function.

  • time_boundaries – List [min, max]: only trials with data points spanning the whole time interval will be considered for the decoding analysis.

Returns:

performances, null

decoding_at_time(data, conditions, time_attr, time, dt, decodanda_params, decoding_params)

decodanda.utilities module

class CrossValidator(classifier, conditioned_rasters, conditioned_trial_index, dic, training_fraction, ndata, subset, semantic_vectors, dic_key, z_score)

Bases: object

one_cv_step(dic, training_fraction, ndata, testing_trials=None)
test(testing_raster_A, testing_raster_B, label_A, label_B)
train(training_raster_A, training_raster_B, label_A, label_B)
class DictSession(dictionary)

Bases: object

Translator from dictionary to session object with getattr

class FakeSession(n_neurons, ndata, persistence_letter=0.97, persistence_number=0.97, persistence_color=0.97, noise_amplitude=0.5, coding_fraction=0.1, rotate=False, symplex=False)

Bases: object

class Logger(filename=None)

Bases: object

initialize(filename=None)
log(string)
log_stats(key, data, test, stat_name, stat_val, p)
log_stats_nullmodel(key, data, null, p)
log_stats_ttest_1s(key, data, null, t, p)
annotate_ttest_p(dataA, dataB, x1, x2, ax, pairplot=False, force=False, p=-1, h='max')
block_diagonal(arrs)
box_comparison_two(A, B, labelA, labelB, quantity, force=False, swarm=False, violin=False, box=False, paired=False, bar=False, p=None, ax=None)
chunk_shuffle_index(T, chunk_size)
compute_dic_key(dic)
contiguous_chunking(mask, max_chunk_size=None)
corr_dissimilarity(X)
cosine(x, y)
cossim(x, y)
cosyne_dissimilarity(X)
count_pval(x, null)
delete_silent_bins(array)
destroy_time_correlations(array)
distance_from_plane(p1, p2, p3, point)
draw_pair_plot(data1, data2, x1, x2, ax, swarm=False)
enforce_min_time_separation(trial_vector, min_time, time_vector, weight='n_trials')
Select a subset of trial segments (contiguous runs with constant trial id) such that:
  • any two kept segments are separated by at least min_time in time_vector

  • the selection maximizes an objective (weight)

Invalid bins (never selectable):
  • trial_vector == -1 (float or int)

  • NaN trial_vector (float only)

Parameters:
  • trial_vector ((T,) array-like) – Trial id per time bin.

  • min_time (float) – Minimum required gap between the end of one kept segment and the start of the next. If <= 0, returns mask of all valid bins.

  • time_vector ((T,) array-like) – Time coordinate per bin.

  • weight ({“n_trials”, “n_bins”, “duration”}) –

    Objective to maximize:
    • “n_trials”: keep as many segments as possible (default)

    • “n_bins”: keep as many bins as possible

    • “duration”: keep as much time coverage as possible (end-start per segment)

Returns:

keep_mask – True on bins belonging to kept segments.

Return type:

(T,) bool ndarray

equalize_ax(ax)

Make axes of 3D plot have equal scale so that spheres appear as spheres, cubes as cubes, etc.. This is one possible solution to Matplotlib’s ax.set_aspect(‘equal’) and ax.axis(‘equal’) not working for 3D. From karlo on stack exchange Input

ax: a matplotlib axis, e.g., as output from plt.gca().

generate_binary_words(n)
generate_dichotomies(n)
generate_synthetic_data(n_neurons=50, n_trials=50, timebins_per_trial=5, keyA='stimulus', keyB='action', rateA=0.1, rateB=0.1, corrAB=0, scale=1, meanfr=0.1, mixing_factor=0.0, mixed_term=0.0)
generate_synthetic_data_intime(n_neurons=50, min_time=-10, max_time=10, signal=0.2, ntrials=10)
generate_words(conditions)

conditions: dict[var_name -> dict[value_name -> predicate]]

Returns:
words: np.ndarray of shape (n_combinations, n_variables)

where each column is an integer code for that variable’s value. The order of variables is list(conditions.keys()).

hamming(x, y)
hamming_distance(x, y)
histogram_comparison(Adata, Bdata, labelA, labelB, quantity, bins=None, ax=None)
interplanar_distance(centroids)
log_dichotomy(dec, dic, ndata, s='Decoding')
mahalanobis_dissimilarity(d)
metric_dissimilarity(X)
non_contiguous_mask(trials, chunks)
p_to_ast(p)
p_to_text(p)
plot_confusion_matrix(cm, labels=None, normalize: bool = False, ax=None, cmap='viridis', fontsize=10)
print_stats(data, name)
sample_from_rasters(rasters, ndata, mode='sample')
sample_training_testing_from_rasters(rasters, ndata, training_fraction, trials, mode='sample', testing_trials=None, randomstate=None, debug=False)
semantic_score(dic)
string_digits(x)
training_test_block_masks(T, training_fraction, trials, randomstate=None, debug=False, testing_trials=None)
visualize_data_vs_null(data, null, value, ax=None)
visualize_raster(raster, ax='auto', offset=0, order=None, colors=None)
visualize_synthetic_data(session)
z_pval(x, null)

decodanda.visualize module

corr_scatter(x_data, y_data, xlabel, ylabel, ax=None, data_labels=None, corr=None, annotate=True, **kwargs)
corrfunc(x, y, ax=None, **kws)

Plot the correlation coefficient in the top left hand corner of a plot.

line_with_shade(x, y, errfunc=<function nanstd>, ax=None, axis=0, label='', color='k', alpha=0.1, **kwargs)
plot_perfs_null_model(perfs, perfs_nullmodel, marker='o', ylabel='Decoding performance', ax=None, shownull=False, chance=0.5, setup=True, ptype='z', annotate=True, ylow=0.27, yhigh=1.02, **kwargs)
plot_perfs_null_model_single(data, null, x=0, marker='d', ax=None, shownull=False, color='b', ptype='zscore')
setup_decoding_axis(ax, labels, ylow=0.4, yhigh=1.0, null=0.5)
smooth_hist(data, ax, bins, stairs=False, label=None, color='k')
visualize_PCA(dec, dim=3, ndata=None, savename=None, title='', data=None, null=None, names=None, axs=None, alpha=None, z_score=False, mean=False, draw_hd2_lines=True)
visualize_decodanda_MDS(dec, dim=3, savename=None, title='', data=None, null=None, names=None, axs=None)
visualize_decoding(dec, dic, perfs, null, ndata=100, training_fraction=0.5, testing_trials=None)
visualize_raster(raster, ax='auto', offset=0, order=None, colors=None, contrast=1.0)
visualize_session(session, neural_key='raster', other_keys='all', contrast=1.0)

Module contents