Analysis

Analyse mixer ensembles to extract static insights and train predict-time models for dynamic insights.

class analysis.AccStats(deps=('ICP',))[source]

Computes accuracy stats and a confusion matrix for the validation dataset

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them). :rtype: Dict[str, object]

Parameters:

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :type kwargs: :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

class analysis.BaseAnalysisBlock(deps=())[source]

Class to be inherited by any analysis/explainer block.

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them). :rtype: Dict[str, object]

Parameters:

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :type kwargs: :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

explain(row_insights, global_insights, **kwargs)[source]

This method should be called once during the explaining phase at inference time, or not called at all. Additional explanations can be at an instance level (row-wise) or global. For the former, return a data frame with any new insights. For the latter, a dictionary is required.

Parameters:
  • row_insights (DataFrame) – dataframe with previously computed row-level explanations.

  • global_insights (Dict[str, object]) – dict() with any explanations that concern all predicted instances or the model itself.

Return type:

Tuple[DataFrame, Dict[str, object]]

Returns:

  • row_insights: modified input dataframe with any new row insights added here.

  • global_insights: dict() with any explanations that concern all predicted instances or the model itself.

class analysis.ConfStats(deps=('ICP',), ece_bins=10)[source]

Computes confidence-related statistics on the held-out validation dataset.

TODO: regression & forecasting tasks

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them). :rtype: Dict[str, object]

Parameters:

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :type kwargs: :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

class analysis.ICP(confidence_normalizer=False, fixed_significance=None, deps=())[source]

Confidence estimation block, uses inductive conformal predictors (ICPs) for model agnosticity

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them). :rtype: Dict[str, object]

Parameters:

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :type kwargs: :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

explain(row_insights, global_insights, **kwargs)[source]

This method should be called once during the explaining phase at inference time, or not called at all. Additional explanations can be at an instance level (row-wise) or global. For the former, return a data frame with any new insights. For the latter, a dictionary is required.

Parameters:
  • row_insights (DataFrame) – dataframe with previously computed row-level explanations.

  • global_insights (Dict[str, object]) – dict() with any explanations that concern all predicted instances or the model itself.

Return type:

Tuple[DataFrame, Dict[str, object]]

Returns:

  • row_insights: modified input dataframe with any new row insights added here.

  • global_insights: dict() with any explanations that concern all predicted instances or the model itself.

class analysis.PermutationFeatureImportance(disable_column_importance=False, row_limit=1000, col_limit=10, deps=('A', 'c', 'c', 'S', 't', 'a', 't', 's'))[source]

Analysis block that estimates column importances via permutation.

Roughly speaking, the procedure:
  • iterates over all input columns

  • if the input column is optional, shuffle its values, then generate predictions for the input data

  • compare this accuracy with the accuracy obtained using unshuffled data

  • all accuracy differences are normalized with respect to the original accuracy (clipped at zero if negative)

  • report these as estimated column importance scores

Note that, crucially, this method does not refit the predictor at any point.

Parameters:
  • row_limit – Set to 0 to use the entire validation dataset.

  • col_limit – Set to 0 to consider all possible columns.

Reference:

https://scikit-learn.org/stable/modules/permutation_importance.html https://compstat-lmu.github.io/iml_methods_limitations/pfi.html

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them). :rtype: Dict[str, object]

Parameters:

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :type kwargs: :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

class analysis.TempScaler(deps=())[source]

Original reference (MIT Licensed): https://github.com/gpleiss/temperature_scaling NB: Output of the neural network should be the classification logits, NOT the softmax (or log softmax)! TODO

analyze(info, **kwargs)[source]

Tune and set the temperature of a neural model optimizing NLL using validation set logits.

Return type:

Dict[str, object]

explain(row_insights, global_insights, **kwargs)[source]

Perform temperature scaling on logits

Return type:

Tuple[DataFrame, Dict[str, object]]

analysis.explain(data, encoded_data, predictions, target_name, target_dtype, problem_definition, stat_analysis, pred_args, runtime_analysis, explainer_blocks=[], ts_analysis={})[source]

This procedure runs at the end of every normal .predict() call. Its goal is to generate prediction insights, potentially using information generated at the model analysis stage (e.g. confidence estimation).

As in analysis(), any user-specified analysis blocks (see class BaseAnalysisBlock) are also called here.

Returns:

row_insights: a DataFrame containing predictions and all generated insights at a row-level.

analysis.model_analyzer(predictor, data, train_data, stats_info, target, pdef, dtype_dict, accuracy_functions, ts_analysis, analysis_blocks=[])[source]

Analyses model on a validation subset to evaluate accuracy, estimate feature importance and generate a calibration model to estimating confidence in future predictions.

Additionally, any user-specified analysis blocks (see class BaseAnalysisBlock) are also called here.

Return type:

Tuple[ModelAnalysis, Dict[str, object]]

Returns:

runtime_analyzer: This dictionary object gets populated in a sequential fashion with data generated from any .analyze() block call. This dictionary object is stored in the predictor itself, and used when calling the .explain() method of all analysis blocks when generating predictions.

model_analysis: ModelAnalysis object that contains core analysis metrics, not necessarily needed when predicting.