Mixers

Mixers learn to map encoded representation, they are the core of lightwood’s automl.

class mixer.ARIMAMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='statsforecast.StatsForecastAutoARIMA', auto_size=True, sp=None, use_stl=False, start_p=2, d=None, start_q=2, max_p=5, max_d=2, max_q=5, start_P=1, D=None, start_Q=1, max_P=2, max_D=1, max_Q=2, max_order=5, seasonal=True, stationary=False, information_criterion=None, alpha=0.05, test='kpss', seasonal_test=None, stepwise=True, n_jobs=None, start_params=None, trend=None, method=None, maxiter=50, offset_test_args=None, seasonal_test_args=None, suppress_warnings=True, error_action='warn', trace=False, random=False, random_state=None, n_fits=None, out_of_sample_size=0, scoring='mse', scoring_args=None, with_intercept=True, update_pdq=True, time_varying_regression=False, enforce_stationarity=True, enforce_invertibility=True, simple_differencing=False, measurement_error=False, mle_regression=True, hamilton_representation=False, concentrate_scale=False)[source]

Wrapper for SkTime’s AutoARIMA interface.

Parameters:
  • stop_after (float) – time budget in seconds

  • target (str) – column containing target time series

  • dtype_dict (Dict[str, str]) – data types for each dataset column

  • horizon (int) – forecast length

  • ts_analysis (Dict) – lightwood-produced stats about input time series

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

For the rest of the parameters, please refer to SkTime’s documentation.

class mixer.BaseMixer(stop_after)[source]

Base class for all mixers.

Mixers are the backbone of all Lightwood machine learning models. They intake encoded feature representations for every column, and are tasked with learning to fulfill the predictive requirements stated in a problem definition.

There are two important methods for any mixer to work:
  1. fit() contains all logic to train the mixer with the training data that has been encoded by all the (already trained) Lightwood encoders for any given task.

  2. __call__() is executed to generate predictions once the mixer has been trained using fit().

An additional partial_fit() method is used to update any mixer that has already been trained.

Class Attributes: - stable: If set to True, this mixer should always work. Any mixer with stable=False can be expected to fail under some circumstances. - fit_data_len: Length of the training data. - supports_proba: For classification tasks, whether the mixer supports yielding per-class scores rather than only returning the predicted label. - trains_once: If True, the mixer is trained once during learn, using all available input data (train and dev splits for training, test for validation). Otherwise, it trains once with the train` split & dev for validation, and optionally (depending on the problem definition fit_on_all and mixer-wise fit_on_dev arguments) a second time after post-training analysis via partial_fit, with train and dev splits as training subset, and test split as validation. Should only be set to True for mixers that don’t require post-training analysis, as otherwise actual validation data would be treated as a held-out portion, which is a mistake.

Parameters:

stop_after (float) – Time budget (in seconds) to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters:
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type:

None

partial_fit(train_data, dev_data, adjust_args=None)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters:
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

  • adjust_args (Optional[dict]) – optional arguments to customize the finetuning process.

Return type:

None

class mixer.ETSMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='ets.AutoETS', auto_size=True, sp=None, use_stl=True, error='add', trend=None, damped_trend=False, seasonal=None, initialization_method='estimated', initial_level=None, initial_trend=None, initial_seasonal=None, bounds=None, start_params=None, maxiter=1000, auto=False, information_criterion='aic', allow_multiplicative_trend=False, restrict=True, additive_only=False, ignore_inf_ic=True, n_jobs=None, random_state=None)[source]

Wrapper for SkTime’s AutoETS interface.

Parameters:
  • stop_after (float) – time budget in seconds

  • target (str) – column containing target time series

  • dtype_dict (Dict[str, str]) – data types for each dataset column

  • horizon (int) – forecast length

  • ts_analysis (Dict) – lightwood-produced stats about input time series

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

For the rest of the parameters, please refer to SkTime’s documentation.

class mixer.Neural(stop_after, target, dtype_dict, target_encoder, net, fit_on_dev, search_hyperparameters, n_epochs=None, lr=None)[source]

The Neural mixer trains a fully connected dense network from concatenated encoded outputs of each of the features in the dataset to predicted the encoded output.

Parameters:
  • stop_after (float) – How long the total fitting process should take

  • target (str) – Name of the target column

  • dtype_dict (Dict[str, str]) – Data type dictionary

  • target_encoder (BaseEncoder) – Reference to the encoder used for the target

  • net (str) – The network type to use (DeafultNet or ArNet)

  • fit_on_dev (bool) – If we should fit on the dev dataset

  • search_hyperparameters (bool) – If the network should run a more through hyperparameter search (currently disabled)

  • n_epochs (Optional[int]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.

  • lr (Optional[float]) – learning rate for the network. By default, it is automatically selected based on an initial search process.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters:
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Augments the mixer’s fit with new data, nr of epochs is based on the amount of epochs the original fitting took

Parameters:
  • train_data (EncodedDs) – The network is fit/trained on this

  • dev_data (EncodedDs) – Data used for early stopping and hyperparameter determination

Return type:

None

class mixer.NeuralTs(stop_after, target, dtype_dict, timeseries_settings, target_encoder, net, fit_on_dev, search_hyperparameters, ts_analysis, n_epochs=None, use_stl=False)[source]

Subclassed Neural mixer used for time series forecasting scenarios.

Parameters:
  • stop_after (float) – How long the total fitting process should take

  • target (str) – Name of the target column

  • dtype_dict (Dict[str, str]) – Data type dictionary

  • timeseries_settings (TimeseriesSettings) – TimeseriesSettings object for time-series tasks, refer to its documentation for available settings.

  • target_encoder (BaseEncoder) – Reference to the encoder used for the target

  • net (str) – The network type to use (DeafultNet or ArNet)

  • fit_on_dev (bool) – If we should fit on the dev dataset

  • search_hyperparameters (bool) – If the network should run a more through hyperparameter search (currently disabled)

  • n_epochs (Optional[int]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters:
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type:

None

class mixer.ProphetMixer(stop_after, target, dtype_dict, horizon, ts_analysis, auto_size=True, use_stl=False, add_seasonality=None, add_country_holidays=None, growth='linear', growth_floor=0, growth_cap=None, changepoints=None, n_changepoints=25, changepoint_range=0.8, yearly_seasonality='auto', weekly_seasonality='auto', daily_seasonality='auto', holidays=None, seasonality_mode='additive', seasonality_prior_scale=10.0, holidays_prior_scale=10.0, changepoint_prior_scale=0.05, mcmc_samples=0, alpha=0.05, uncertainty_samples=1000)[source]

Wrapper for SkTime’s Prophet interface.

Parameters:
  • stop_after (float) – time budget in seconds

  • target (str) – column containing target time series

  • dtype_dict (Dict[str, str]) – data types for each dataset column

  • horizon (int) – forecast length

  • ts_analysis (Dict) – lightwood-produced stats about input time series

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

For the rest of the parameters, please refer to SkTime’s documentation.

class mixer.RandomForest(stop_after, target, dtype_dict, fit_on_dev, target_encoder, use_optuna=False)[source]

The RandomForest mixer supports both regression and classification tasks. It inherits from sklearn.ensemble.RandomForestRegressor and sklearn.ensemble.RandomForestClassifier. (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

Parameters:
  • stop_after (float) – time budget in seconds.

  • target (str) – name of the target column that the mixer will learn to predict.

  • dtype_dict (Dict[str, str]) – dictionary with dtypes of all columns in the data.

  • fit_on_dev (bool) – whether to fit on the dev dataset.

  • use_optuna (bool) – whether to activate the automated hyperparameter search (optuna-based). Note that setting this flag to True does not guarantee the search will run, rather, the speed criteria will be checked first (i.e., if a single iteration is too slow with respect to the time budget, the search will not take place).

fit(train_data, dev_data)[source]

Fits the RandomForest model.

Parameters:
  • train_data (EncodedDs) – encoded features for training dataset

  • dev_data (EncodedDs) – encoded features for dev dataset

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

The RandomForest mixer does not support updates. If the model does not exist, a new one will be created and fitted.

Parameters:
  • train_data (EncodedDs) – encoded features for (new) training dataset

  • dev_data (EncodedDs) – encoded features for (new) dev dataset

Return type:

None

class mixer.Regression(stop_after, target_encoder, dtype_dict, target)[source]

The Regression mixer inherits from scikit-learn’s Ridge class (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)

This class performs Ordinary Least-squares Regression (OLS) under the hood; this means it fits a set of coefficients (w_1, w_2, …, w_N) for an N-length feature vector, that minimize the difference between the predicted target value and the observed true value.

This mixer intakes featurized (encoded) data to predict the target. It deploys if the target data-type is considered numerical/integer.

Parameters:
  • stop_after (float) – Maximum amount of seconds it should fit for, currently ignored

  • target_encoder (BaseEncoder) – The encoder which will be used to decode the target

  • dtype_dict (dict) – A map of feature names and their data types

  • target (str) – Name of the target column

fit(train_data, dev_data)[source]

Fits Ridge model on input feature data to provide predictions.

Parameters:
  • train_data (EncodedDs) – Regression if fit on this

  • dev_data (EncodedDs) – This just gets concatenated to the train_data

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Fits the linear regression on some data, this refits the model entirely rather than updating it

Parameters:
  • train_data (EncodedDs) – Regression is fit on this

  • dev_data (EncodedDs) – This just gets concatenated to the train_data

Return type:

None

class mixer.SkTime(stop_after, target, dtype_dict, horizon, ts_analysis, model_path=None, model_kwargs=None, auto_size=True, sp=None, hyperparam_search=True, use_stl=False)[source]

This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.

Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that

predictions are for the very next timestamp post-training.

If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.

There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.

Parameters:
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • dtype_dict (Dict[str, str]) – dtypes of all columns in the data.

  • horizon (int) – length of forecasted horizon.

  • sp (Optional[int]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

  • model_path (Optional[str]) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.

  • model_kwargs (Optional[dict]) – specifies additional paramters to pass to the model if model_path is provided.

  • hyperparam_search (bool) – bool that indicates whether to perform the hyperparameter tuning or not.

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

fit(train_data, dev_data)[source]

Fits a set of sktime forecasters. The number of models depends on how many groups are observed at training time.

Forecaster type can be specified by providing the model_class argument in __init__(). It can also be determined by hyperparameter optimization based on dev data validation error.

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Note: sktime asks for “specification of the time points for which forecasts are requested”, and this mixer complies by assuming forecasts will start immediately after the last observed value.

Because of this, ProblemDefinition.fit_on_all is set to True so that partial_fit uses both dev and test splits to fit the models.

Due to how lightwood implements the update procedure, expected inputs for this method are:

Parameters:
  • dev_data (EncodedDs) – original test split (used to validate and select model if ensemble is BestOf).

  • train_data (EncodedDs) – concatenated original train and dev splits.

Return type:

None

class mixer.TabTransformerMixer(stop_after, target, dtype_dict, target_encoder, fit_on_dev, search_hyperparameters, train_args=None)[source]

This mixer trains a TabTransformer network (FT variant), using concatenated encoder outputs for each dataset feature as input, to predict the encoded target column representation as output.

Training logic is based on the Neural mixer, please refer to it for more details on each input parameter.

fit(train_data, dev_data)[source]

Skip the usual partial_fit call at the end.

Return type:

None

class mixer.Unit(stop_after, target_encoder)[source]

The “Unit” mixer serves as a simple wrapper around a target encoder, essentially borrowing the encoder’s functionality for predictions. In other words, it simply arg-maxes the output of the encoder

Used with encoders that already fine-tune on the targets (namely, pre-trained text ML models).

Attributes: :type target_encoder: BaseEncoder

param target_encoder:

An instance of a Lightwood BaseEncoder. This encoder is used to decode predictions.

param stop_after (float):

Time budget (in seconds) to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters:
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters:
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

  • adjust_args – optional arguments to customize the finetuning process.

Return type:

None

class mixer.XGBoostArrayMixer(stop_after, target, dtype_dict, input_cols, fit_on_dev, target_encoder, ts_analysis, use_stl, tss)[source]

XGBoost-based model, intended for usage in forecasting tasks.

Parameters:

stop_after (float) – Time budget (in seconds) to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters:
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters:
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

  • adjust_args – optional arguments to customize the finetuning process.

Return type:

None

class mixer.XGBoostMixer(stop_after, target, dtype_dict, input_cols, fit_on_dev, use_optuna, target_encoder)[source]
Parameters:
  • stop_after (float) – time budget in seconds.

  • target (str) – name of the target column that the mixer will learn to predict.

  • dtype_dict (Dict[str, str]) – dictionary with dtypes of all columns in the data.

  • input_cols (List[str]) – list of column names.

  • fit_on_dev (bool) – whether to perform a partial_fit() at the end of fit() using the dev data split.

  • use_optuna (bool) – whether to activate the automated hyperparameter search (optuna-based). Note that setting this flag to True does not guarantee the search will run, rather, the speed criteria will be checked first (i.e., if a single iteration is too slow with respect to the time budget, the search will not take place).

fit(train_data, dev_data)[source]

Fits the XGBoost model.

Parameters:
  • train_data (EncodedDs) – encoded features for training dataset

  • dev_data (EncodedDs) – encoded features for dev dataset

Return type:

None

partial_fit(train_data, dev_data, args=None)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters:
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

  • adjust_args – optional arguments to customize the finetuning process.

Return type:

None

supports_proba: bool

Gradient boosting mixer with an XGBoost backbone.

This mixer is a good all-rounder, due to the generally great performance of tree-based ML algorithms for supervised learning tasks with tabular data. If you want more information regarding the techniques that set apart XGBoost from other gradient boosters, please refer to their technical paper: “XGBoost: A Scalable Tree Boosting System” (2016).

We can basically think of this mixer as a wrapper to the XGBoost Python package. To do so, there are a few caveats the user may want to be aware about:
  • If you seek GPU utilization, XGBoost must be compiled from source instead of being installed through pip.

  • Integer, float, and quantity dtype`s are treated as regression tasks with `reg:squarederror loss. All other supported dtype`s is casted as a multiclass task with `multi:softmax loss.

  • A partial fit can be performed with the dev data split as part of fit, if specified with the fit_on_dev argument.

There are a couple things in the backlog that will hopefully be added soon:
  • An automatic optuna-based hyperparameter search. This procedure triggers when a single iteration of XGBoost is deemed fast enough (given the time budget).

  • Support for “unknown class” as a possible answer for multiclass tasks.