Ensemble

Ensemble mixers together in order to generate predictions

class ensemble.BaseEnsemble(target, mixers, data, fit=True)[source]

Base class for all ensembles.

Ensembles wrap sets of Lightwood mixers, with the objective of generating better predictions based on the output of each mixer.

There are two important methods for any ensemble to work:
  1. __init__() should prepare all mixers and internal ensemble logic.

  2. __call__() applies any aggregation rules to generate final predictions based on the output of each mixer.

Class Attributes: - mixers: List of mixers the ensemble will use. - supports_proba: For classification tasks, whether the ensemble supports yielding per-class scores rather than only returning the predicted label.

NOTE: this ensemble is not functional. Do not use it when generating custom JsonAI objects, as the learning process will fail.

class ensemble.BestOf(target, mixers, data, accuracy_functions, args, ts_analysis=None, fit=True)[source]

This ensemble acts as a mixer selector. After evaluating accuracy for all internal mixers with the validation data, it sets the best mixer as the underlying model.

class ensemble.Embedder(target, mixers, data)[source]

This ensemble acts as a simple embedder that bypasses all mixers. When called, it will return the encoded representation of the data stored in (or generated by) an EncodedDs object.

class ensemble.IdentityEnsemble(target, mixers, data, args)[source]

This ensemble performs no aggregation. User can define an “active mixer” and calling the ensemble will call said mixer.

Ideal for use cases with single mixers where (potentially expensive) evaluation runs are done internally, as in BestOf.

class ensemble.MeanEnsemble(target, mixers, data, dtype_dict, fit=True, **kwargs)[source]

When called, this ensemble will return the mean prediction from the entire list of underlying mixers.

NOTE: can only be used in regression tasks.

class ensemble.ModeEnsemble(target, mixers, data, dtype_dict, accuracy_functions, args, ts_analysis=None, fit=True, **kwargs)[source]

When called, this ensemble will return the mode prediction from the entire list of underlying mixers.

If there are multiple modes, the mode whose voting mixers have the highest score will be returned.

NOTE: can only be used in categorical tasks.

class ensemble.StackedEnsemble(target, mixers, data, dtype_dict, args, fit=True, **kwargs)[source]

This ensemble will learn an optimal weight vector via Stochastic Gradient Descent on the validation dataset and the respective mixer predictions.

Starting weights for the vector are uniformly set.

Note this mixer is still in experimental phase. Some features in the roadmap are:
  • support for handling faulty mixers

  • support for custom initial vector weights

  • early stopping

  • arbitrarily complex secondary model

class ensemble.TsStackedEnsemble(target, mixers, data, dtype_dict, ts_analysis, args, fit=True, **kwargs)[source]

Thin wrapper for StackedEnsemble that enables forecasting support.

class ensemble.WeightedMeanEnsemble(target, mixers, data, args, dtype_dict, accuracy_functions, ts_analysis=None, fit=True, **kwargs)[source]

This ensemble determines a weight vector to return a weighted mean of the underlying mixers.

More specifically, each model is evaluated on the validation dataset and assigned an accuracy score (as per the fixed accuracy function at the JsonAI level).

Afterwards, all the scores are softmaxed to obtain the final weights.

Note: this ensemble only supports regression tasks.