Mixers
Mixers learn to map encoded representation, they are the core of lightwood’s automl.
- class mixer.ARIMAMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='statsforecast.StatsForecastAutoARIMA', auto_size=True, sp=None, use_stl=False, start_p=2, d=None, start_q=2, max_p=5, max_d=2, max_q=5, start_P=1, D=None, start_Q=1, max_P=2, max_D=1, max_Q=2, max_order=5, seasonal=True, stationary=False, information_criterion=None, alpha=0.05, test='kpss', seasonal_test=None, stepwise=True, n_jobs=None, start_params=None, trend=None, method=None, maxiter=50, offset_test_args=None, seasonal_test_args=None, suppress_warnings=True, error_action='warn', trace=False, random=False, random_state=None, n_fits=None, out_of_sample_size=0, scoring='mse', scoring_args=None, with_intercept=True, update_pdq=True, time_varying_regression=False, enforce_stationarity=True, enforce_invertibility=True, simple_differencing=False, measurement_error=False, mle_regression=True, hamilton_representation=False, concentrate_scale=False)[source]
Wrapper for SkTime’s AutoARIMA interface.
- Parameters:
stop_after (
float
) – time budget in secondstarget (
str
) – column containing target time seriesdtype_dict (
Dict
[str
,str
]) – data types for each dataset columnhorizon (
int
) – forecast lengthts_analysis (
Dict
) – lightwood-produced stats about input time seriesauto_size (
bool
) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.use_stl (
bool
) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.
For the rest of the parameters, please refer to SkTime’s documentation.
- class mixer.BaseMixer(stop_after)[source]
Base class for all mixers.
Mixers are the backbone of all Lightwood machine learning models. They intake encoded feature representations for every column, and are tasked with learning to fulfill the predictive requirements stated in a problem definition.
- There are two important methods for any mixer to work:
fit() contains all logic to train the mixer with the training data that has been encoded by all the (already trained) Lightwood encoders for any given task.
__call__() is executed to generate predictions once the mixer has been trained using fit().
An additional partial_fit() method is used to update any mixer that has already been trained.
Class Attributes: - stable: If set to True, this mixer should always work. Any mixer with stable=False can be expected to fail under some circumstances. - fit_data_len: Length of the training data. - supports_proba: For classification tasks, whether the mixer supports yielding per-class scores rather than only returning the predicted label. - trains_once: If True, the mixer is trained once during learn, using all available input data (train and dev splits for training, test for validation). Otherwise, it trains once with the train` split & dev for validation, and optionally (depending on the problem definition fit_on_all and mixer-wise fit_on_dev arguments) a second time after post-training analysis via partial_fit, with train and dev splits as training subset, and test split as validation. Should only be set to True for mixers that don’t require post-training analysis, as otherwise actual validation data would be treated as a held-out portion, which is a mistake.
- Parameters:
stop_after (
float
) – Time budget (in seconds) to train this mixer.
- partial_fit(train_data, dev_data, adjust_args=None)[source]
Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.
- Parameters:
train_data (
EncodedDs
) – encoded representations of the new training data subset.dev_data (
EncodedDs
) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.adjust_args (
Optional
[dict
]) – optional arguments to customize the finetuning process.
- Return type:
None
- class mixer.ETSMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='ets.AutoETS', auto_size=True, sp=None, use_stl=True, error='add', trend=None, damped_trend=False, seasonal=None, initialization_method='estimated', initial_level=None, initial_trend=None, initial_seasonal=None, bounds=None, start_params=None, maxiter=1000, auto=False, information_criterion='aic', allow_multiplicative_trend=False, restrict=True, additive_only=False, ignore_inf_ic=True, n_jobs=None, random_state=None)[source]
Wrapper for SkTime’s AutoETS interface.
- Parameters:
stop_after (
float
) – time budget in secondstarget (
str
) – column containing target time seriesdtype_dict (
Dict
[str
,str
]) – data types for each dataset columnhorizon (
int
) – forecast lengthts_analysis (
Dict
) – lightwood-produced stats about input time seriesauto_size (
bool
) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.use_stl (
bool
) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.
For the rest of the parameters, please refer to SkTime’s documentation.
- class mixer.Neural(stop_after, target, dtype_dict, target_encoder, net, fit_on_dev, search_hyperparameters, n_epochs=None, lr=None)[source]
The Neural mixer trains a fully connected dense network from concatenated encoded outputs of each of the features in the dataset to predicted the encoded output.
- Parameters:
stop_after (
float
) – How long the total fitting process should taketarget (
str
) – Name of the target columndtype_dict (
Dict
[str
,str
]) – Data type dictionarytarget_encoder (
BaseEncoder
) – Reference to the encoder used for the targetnet (
str
) – The network type to use (DeafultNet or ArNet)fit_on_dev (
bool
) – If we should fit on the dev datasetsearch_hyperparameters (
bool
) – If the network should run a more through hyperparameter search (currently disabled)n_epochs (
Optional
[int
]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.lr (
Optional
[float
]) – learning rate for the network. By default, it is automatically selected based on an initial search process.
- class mixer.NeuralTs(stop_after, target, dtype_dict, timeseries_settings, target_encoder, net, fit_on_dev, search_hyperparameters, ts_analysis, n_epochs=None, use_stl=False)[source]
Subclassed Neural mixer used for time series forecasting scenarios.
- Parameters:
stop_after (
float
) – How long the total fitting process should taketarget (
str
) – Name of the target columndtype_dict (
Dict
[str
,str
]) – Data type dictionarytimeseries_settings (
TimeseriesSettings
) – TimeseriesSettings object for time-series tasks, refer to its documentation for available settings.target_encoder (
BaseEncoder
) – Reference to the encoder used for the targetnet (
str
) – The network type to use (DeafultNet or ArNet)fit_on_dev (
bool
) – If we should fit on the dev datasetsearch_hyperparameters (
bool
) – If the network should run a more through hyperparameter search (currently disabled)n_epochs (
Optional
[int
]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.
- class mixer.ProphetMixer(stop_after, target, dtype_dict, horizon, ts_analysis, auto_size=True, use_stl=False, add_seasonality=None, add_country_holidays=None, growth='linear', growth_floor=0, growth_cap=None, changepoints=None, n_changepoints=25, changepoint_range=0.8, yearly_seasonality='auto', weekly_seasonality='auto', daily_seasonality='auto', holidays=None, seasonality_mode='additive', seasonality_prior_scale=10.0, holidays_prior_scale=10.0, changepoint_prior_scale=0.05, mcmc_samples=0, alpha=0.05, uncertainty_samples=1000)[source]
Wrapper for SkTime’s Prophet interface.
- Parameters:
stop_after (
float
) – time budget in secondstarget (
str
) – column containing target time seriesdtype_dict (
Dict
[str
,str
]) – data types for each dataset columnhorizon (
int
) – forecast lengthts_analysis (
Dict
) – lightwood-produced stats about input time seriesauto_size (
bool
) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.use_stl (
bool
) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.
For the rest of the parameters, please refer to SkTime’s documentation.
- class mixer.RandomForest(stop_after, target, dtype_dict, fit_on_dev, target_encoder, use_optuna=False)[source]
The RandomForest mixer supports both regression and classification tasks. It inherits from sklearn.ensemble.RandomForestRegressor and sklearn.ensemble.RandomForestClassifier. (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
- Parameters:
stop_after (
float
) – time budget in seconds.target (
str
) – name of the target column that the mixer will learn to predict.dtype_dict (
Dict
[str
,str
]) – dictionary with dtypes of all columns in the data.fit_on_dev (
bool
) – whether to fit on the dev dataset.use_optuna (
bool
) – whether to activate the automated hyperparameter search (optuna-based). Note that setting this flag to True does not guarantee the search will run, rather, the speed criteria will be checked first (i.e., if a single iteration is too slow with respect to the time budget, the search will not take place).
- class mixer.Regression(stop_after, target_encoder, dtype_dict, target)[source]
The Regression mixer inherits from scikit-learn’s Ridge class (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)
This class performs Ordinary Least-squares Regression (OLS) under the hood; this means it fits a set of coefficients (w_1, w_2, …, w_N) for an N-length feature vector, that minimize the difference between the predicted target value and the observed true value.
This mixer intakes featurized (encoded) data to predict the target. It deploys if the target data-type is considered numerical/integer.
- Parameters:
stop_after (
float
) – Maximum amount of seconds it should fit for, currently ignoredtarget_encoder (
BaseEncoder
) – The encoder which will be used to decode the targetdtype_dict (
dict
) – A map of feature names and their data typestarget (
str
) – Name of the target column
- class mixer.SkTime(stop_after, target, dtype_dict, horizon, ts_analysis, model_path=None, model_kwargs=None, auto_size=True, sp=None, hyperparam_search=True, use_stl=False)[source]
This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.
Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that
predictions are for the very next timestamp post-training.
If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.
There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.
- Parameters:
stop_after (
float
) – time budget in seconds.target (
str
) – column to forecast.dtype_dict (
Dict
[str
,str
]) – dtypes of all columns in the data.horizon (
int
) – length of forecasted horizon.sp (
Optional
[int
]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)ts_analysis (
Dict
) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.model_path (
Optional
[str
]) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.model_kwargs (
Optional
[dict
]) – specifies additional paramters to pass to the model if model_path is provided.hyperparam_search (
bool
) – bool that indicates whether to perform the hyperparameter tuning or not.auto_size (
bool
) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.use_stl (
bool
) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.
- fit(train_data, dev_data)[source]
Fits a set of sktime forecasters. The number of models depends on how many groups are observed at training time.
Forecaster type can be specified by providing the model_class argument in __init__(). It can also be determined by hyperparameter optimization based on dev data validation error.
- Return type:
None
- partial_fit(train_data, dev_data, args=None)[source]
Note: sktime asks for “specification of the time points for which forecasts are requested”, and this mixer complies by assuming forecasts will start immediately after the last observed value.
Because of this, ProblemDefinition.fit_on_all is set to True so that partial_fit uses both dev and test splits to fit the models.
Due to how lightwood implements the update procedure, expected inputs for this method are:
- class mixer.TabTransformerMixer(stop_after, target, dtype_dict, target_encoder, fit_on_dev, search_hyperparameters, train_args=None)[source]
This mixer trains a TabTransformer network (FT variant), using concatenated encoder outputs for each dataset feature as input, to predict the encoded target column representation as output.
Training logic is based on the Neural mixer, please refer to it for more details on each input parameter.
- class mixer.Unit(stop_after, target_encoder)[source]
The “Unit” mixer serves as a simple wrapper around a target encoder, essentially borrowing the encoder’s functionality for predictions. In other words, it simply arg-maxes the output of the encoder
Used with encoders that already fine-tune on the targets (namely, pre-trained text ML models).
Attributes: :type target_encoder:
BaseEncoder
- param target_encoder:
An instance of a Lightwood BaseEncoder. This encoder is used to decode predictions.
- param stop_after (float):
Time budget (in seconds) to train this mixer.
- class mixer.XGBoostArrayMixer(stop_after, target, dtype_dict, input_cols, fit_on_dev, target_encoder, ts_analysis, use_stl, tss)[source]
XGBoost-based model, intended for usage in forecasting tasks.
- Parameters:
stop_after (
float
) – Time budget (in seconds) to train this mixer.
- class mixer.XGBoostMixer(stop_after, target, dtype_dict, input_cols, fit_on_dev, use_optuna, target_encoder)[source]
- Parameters:
stop_after (
float
) – time budget in seconds.target (
str
) – name of the target column that the mixer will learn to predict.dtype_dict (
Dict
[str
,str
]) – dictionary with dtypes of all columns in the data.input_cols (
List
[str
]) – list of column names.fit_on_dev (
bool
) – whether to perform a partial_fit() at the end of fit() using the dev data split.use_optuna (
bool
) – whether to activate the automated hyperparameter search (optuna-based). Note that setting this flag to True does not guarantee the search will run, rather, the speed criteria will be checked first (i.e., if a single iteration is too slow with respect to the time budget, the search will not take place).
- partial_fit(train_data, dev_data, args=None)[source]
Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.
- Parameters:
- Return type:
None
-
supports_proba:
bool
Gradient boosting mixer with an XGBoost backbone.
This mixer is a good all-rounder, due to the generally great performance of tree-based ML algorithms for supervised learning tasks with tabular data. If you want more information regarding the techniques that set apart XGBoost from other gradient boosters, please refer to their technical paper: “XGBoost: A Scalable Tree Boosting System” (2016).
- We can basically think of this mixer as a wrapper to the XGBoost Python package. To do so, there are a few caveats the user may want to be aware about:
If you seek GPU utilization, XGBoost must be compiled from source instead of being installed through pip.
Integer, float, and quantity dtype`s are treated as regression tasks with `reg:squarederror loss. All other supported dtype`s is casted as a multiclass task with `multi:softmax loss.
A partial fit can be performed with the dev data split as part of fit, if specified with the fit_on_dev argument.
- There are a couple things in the backlog that will hopefully be added soon:
An automatic optuna-based hyperparameter search. This procedure triggers when a single iteration of XGBoost is deemed fast enough (given the time budget).
Support for “unknown class” as a possible answer for multiclass tasks.