Introduction

In this tutorial, we will go through an example to update a preexisting model. This might be useful when you come across additional data that you would want to consider, without having to train a model from scratch.

The main abstraction that Lightwood offers for this is the BaseMixer.partial_fit() method. To call it, you need to pass new training data and a held-out dev subset for internal mixer usage (e.g. early stopping). If you are using an aggregate ensemble, it’s likely you will want to do this for every single mixer. The convienient PredictorInterface.adjust() does this automatically for you.

Initial model training

First, let’s train a Lightwood predictor for the concrete strength dataset:

[1]:

from lightwood.api.high_level import ProblemDefinition, json_ai_from_problem, predictor_from_json_ai
import pandas as pd

INFO:lightwood-2861:No torchvision detected, image helpers not supported.
INFO:lightwood-2861:No torchvision/pillow detected, image encoder not supported

[2]:

# Load data
df = pd.read_csv('https://raw.githubusercontent.com/mindsdb/lightwood/main/tests/data/concrete_strength.csv')

df = df.sample(frac=1, random_state=1)
train_df = df[:int(0.1*len(df))]
update_df = df[int(0.1*len(df)):int(0.8*len(df))]
test_df = df[int(0.8*len(df)):]

print(f'Train dataframe shape: {train_df.shape}')
print(f'Update dataframe shape: {update_df.shape}')
print(f'Test dataframe shape: {test_df.shape}')

Train dataframe shape: (103, 10)
Update dataframe shape: (721, 10)
Test dataframe shape: (206, 10)

Note that we have three different data splits.

We will use the training split for the initial model training. As you can see, it’s only a 20% of the total data we have. The update split will be used as training data to adjust/update our model. Finally, the held out test set will give us a rough idea of the impact our updating procedure has on the model’s predictive capabilities.

[3]:

# Define predictive task and predictor
target = 'concrete_strength'
pdef = ProblemDefinition.from_dict({'target': target, 'time_aim': 200})
jai = json_ai_from_problem(df, pdef)

# We will keep the architecture simple: a single neural mixer, and a `BestOf` ensemble:
jai.model = {
    "module": "BestOf",
    "args": {
        "args": "$pred_args",
        "accuracy_functions": "$accuracy_functions",
        "submodels": [{
            "module": "Neural",
            "args": {
                "fit_on_dev": False,
                "stop_after": "$problem_definition.seconds_per_mixer",
                "search_hyperparameters": False,
            }
        }]
    }
}

# Build and train the predictor
predictor = predictor_from_json_ai(jai)
predictor.learn(train_df)

INFO:type_infer-2861:Analyzing a sample of 979
INFO:type_infer-2861:from a total population of 1030, this is equivalent to 95.0% of your data.
INFO:type_infer-2861:Using 3 processes to deduct types.
INFO:type_infer-2861:Infering type for: slag
INFO:type_infer-2861:Infering type for: cement
INFO:type_infer-2861:Infering type for: id
INFO:type_infer-2861:Column cement has data type float
INFO:type_infer-2861:Column slag has data type float
INFO:type_infer-2861:Infering type for: flyAsh
INFO:type_infer-2861:Infering type for: water
INFO:type_infer-2861:Column id has data type integer
INFO:type_infer-2861:Infering type for: superPlasticizer
INFO:type_infer-2861:Column flyAsh has data type float
INFO:type_infer-2861:Column water has data type float
INFO:type_infer-2861:Infering type for: coarseAggregate
INFO:type_infer-2861:Infering type for: fineAggregate
INFO:type_infer-2861:Column superPlasticizer has data type float
INFO:type_infer-2861:Infering type for: age
INFO:type_infer-2861:Column coarseAggregate has data type float
INFO:type_infer-2861:Column fineAggregate has data type float
INFO:type_infer-2861:Infering type for: concrete_strength
INFO:type_infer-2861:Column age has data type integer
INFO:type_infer-2861:Column concrete_strength has data type float
INFO:dataprep_ml-2861:Starting statistical analysis
INFO:dataprep_ml-2861:Finished statistical analysis
INFO:dataprep_ml-2861:[Learn phase 1/8] - Statistical analysis
INFO:dataprep_ml-2861:Starting statistical analysis
INFO:dataprep_ml-2861:Finished statistical analysis
DEBUG:lightwood-2861: `analyze_data` runtime: 0.02 seconds
INFO:dataprep_ml-2861:[Learn phase 2/8] - Data preprocessing
INFO:dataprep_ml-2861:Cleaning the data
DEBUG:lightwood-2861: `preprocess` runtime: 0.01 seconds
INFO:dataprep_ml-2861:[Learn phase 3/8] - Data splitting
INFO:dataprep_ml-2861:Splitting the data into train/test
DEBUG:lightwood-2861: `split` runtime: 0.0 seconds
INFO:dataprep_ml-2861:[Learn phase 4/8] - Preparing encoders
DEBUG:dataprep_ml-2861:Preparing sequentially...
DEBUG:dataprep_ml-2861:Preparing encoder for id...
DEBUG:dataprep_ml-2861:Preparing encoder for cement...
DEBUG:dataprep_ml-2861:Preparing encoder for slag...
DEBUG:dataprep_ml-2861:Preparing encoder for flyAsh...
DEBUG:dataprep_ml-2861:Preparing encoder for water...
DEBUG:dataprep_ml-2861:Preparing encoder for superPlasticizer...
DEBUG:dataprep_ml-2861:Preparing encoder for coarseAggregate...
DEBUG:dataprep_ml-2861:Preparing encoder for fineAggregate...
DEBUG:dataprep_ml-2861:Preparing encoder for age...
DEBUG:lightwood-2861: `prepare` runtime: 0.01 seconds
INFO:dataprep_ml-2861:[Learn phase 5/8] - Feature generation
INFO:dataprep_ml-2861:Featurizing the data
DEBUG:lightwood-2861: `featurize` runtime: 0.06 seconds
INFO:dataprep_ml-2861:[Learn phase 6/8] - Mixer training
INFO:dataprep_ml-2861:Training the mixers
/home/runner/work/lightwood/lightwood/lightwood/mixer/neural.py:124: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = GradScaler()
/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/pytorch_ranger/ranger.py:172: UserWarning: This overload of addcmul_ is deprecated:
        addcmul_(Number value, Tensor tensor1, Tensor tensor2)
Consider using one of the following signatures instead:
        addcmul_(Tensor tensor1, Tensor tensor2, *, Number value = 1) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1661.)
  exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
INFO:lightwood-2861:Loss of 39.99637508392334 with learning rate 0.0001
INFO:lightwood-2861:Loss of 21.826460361480713 with learning rate 0.0005
INFO:lightwood-2861:Loss of 15.12899512052536 with learning rate 0.001
INFO:lightwood-2861:Loss of 15.062753021717072 with learning rate 0.002
INFO:lightwood-2861:Loss of 26.490495562553406 with learning rate 0.003
INFO:lightwood-2861:Loss of 33.6572003364563 with learning rate 0.005
INFO:lightwood-2861:Loss of 303.60721158981323 with learning rate 0.01
INFO:lightwood-2861:Loss of nan with learning rate 0.05
INFO:lightwood-2861:Found learning rate of: 0.002
/home/runner/work/lightwood/lightwood/lightwood/mixer/neural.py:305: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = GradScaler()
INFO:lightwood-2861:Loss @ epoch 1: 0.11838734149932861
INFO:lightwood-2861:Loss @ epoch 2: 0.4641949534416199
INFO:lightwood-2861:Loss @ epoch 3: 0.3976145386695862
INFO:lightwood-2861:Loss @ epoch 4: 0.3706841468811035
INFO:lightwood-2861:Loss @ epoch 5: 0.2367912232875824
INFO:lightwood-2861:Loss @ epoch 6: 0.22560915350914001
INFO:lightwood-2861:Loss @ epoch 7: 0.12089195847511292
DEBUG:lightwood-2861: `fit_mixer` runtime: 0.13 seconds
INFO:dataprep_ml-2861:Ensembling the mixer
INFO:lightwood-2861:Mixer: Neural got accuracy: 0.238
INFO:lightwood-2861:Picked best mixer: Neural
DEBUG:lightwood-2861: `fit` runtime: 0.14 seconds
INFO:dataprep_ml-2861:[Learn phase 7/8] - Ensemble analysis
INFO:dataprep_ml-2861:Analyzing the ensemble of mixers
INFO:lightwood-2861:The block ICP is now running its analyze() method
INFO:lightwood-2861:The block ConfStats is now running its analyze() method
INFO:lightwood-2861:The block AccStats is now running its analyze() method
INFO:lightwood-2861:The block PermutationFeatureImportance is now running its analyze() method
INFO:lightwood-2861:[PFI] Using a random sample (1000 rows out of 10).
INFO:lightwood-2861:[PFI] Set to consider first 10 columns out of 9: ['id', 'cement', 'slag', 'flyAsh', 'water', 'superPlasticizer', 'coarseAggregate', 'fineAggregate', 'age'].
DEBUG:lightwood-2861: `analyze_ensemble` runtime: 0.14 seconds
INFO:dataprep_ml-2861:[Learn phase 8/8] - Adjustment on validation requested
INFO:dataprep_ml-2861:Updating the mixers
/home/runner/work/lightwood/lightwood/lightwood/mixer/neural.py:335: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = GradScaler()
/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
INFO:lightwood-2861:Loss @ epoch 1: 0.1678172747294108
DEBUG:lightwood-2861: `adjust` runtime: 0.03 seconds
DEBUG:lightwood-2861: `learn` runtime: 0.42 seconds

[4]:

# Train and get predictions for the held out test set
predictions = predictor.predict(test_df)
predictions

INFO:dataprep_ml-2861:[Predict phase 1/4] - Data preprocessing
INFO:dataprep_ml-2861:Cleaning the data
DEBUG:lightwood-2861: `preprocess` runtime: 0.01 seconds
INFO:dataprep_ml-2861:[Predict phase 2/4] - Feature generation
INFO:dataprep_ml-2861:Featurizing the data
DEBUG:lightwood-2861: `featurize` runtime: 0.03 seconds
INFO:dataprep_ml-2861:[Predict phase 3/4] - Calling ensemble
DEBUG:lightwood-2861: `_timed_call` runtime: 0.03 seconds
INFO:dataprep_ml-2861:[Predict phase 4/4] - Analyzing output
INFO:lightwood-2861:The block ICP is now running its explain() method
INFO:lightwood-2861:The block ConfStats is now running its explain() method
INFO:lightwood-2861:ConfStats.explain() has not been implemented, no modifications will be done to the data insights.
INFO:lightwood-2861:The block AccStats is now running its explain() method
INFO:lightwood-2861:AccStats.explain() has not been implemented, no modifications will be done to the data insights.
INFO:lightwood-2861:The block PermutationFeatureImportance is now running its explain() method
INFO:lightwood-2861:PermutationFeatureImportance.explain() has not been implemented, no modifications will be done to the data insights.
DEBUG:lightwood-2861: `explain` runtime: 0.05 seconds
DEBUG:lightwood-2861: `predict` runtime: 0.13 seconds

[4]:

	original_index	prediction	confidence	lower	upper
0	0	40.909630	0.9991	0.000000	87.398161
1	1	19.146822	0.9991	0.000000	65.635353
2	2	22.482294	0.9991	0.000000	68.970825
3	3	19.593765	0.9991	0.000000	66.082296
4	4	31.724537	0.9991	0.000000	78.213068
...	...	...	...	...	...
201	201	50.553104	0.9991	4.064574	97.041635
202	202	48.580425	0.9991	2.091895	95.068956
203	203	30.114187	0.9991	0.000000	76.602718
204	204	25.676003	0.9991	0.000000	72.164533
205	205	41.231636	0.9991	0.000000	87.720167

206 rows × 5 columns

Updating the predictor

For this, we have two options:

`BaseMixer.partial_fit()`

Updates a single mixer. You need to pass the new data wrapped in EncodedDs objects.

Arguments:

train_data: EncodedDs
dev_data: EncodedDs
adjust_args: Optional[dict] - This will contain any arguments needed by the mixer to adjust new data.

If the mixer does not need a dev_data partition, pass a dummy:

dev_data = EncodedDs(predictor.encoders, pd.DataFrame(), predictor.target)

`PredictorInterface.adjust()`

Updates all mixers inside the predictor by calling their respective partial_fit() methods. Any adjust_args will be transparently passed as well.

Arguments:

new_data: pd.DataFrame
old_data: Optional[pd.DataFrame]
adjust_args: Optional[dict]

Let’s adjust our predictor:

[5]:

predictor.adjust(update_df, train_df)  # data to adjust and original data

INFO:dataprep_ml-2861:Cleaning the data
DEBUG:lightwood-2861: `preprocess` runtime: 0.02 seconds
INFO:dataprep_ml-2861:Cleaning the data
DEBUG:lightwood-2861: `preprocess` runtime: 0.01 seconds
INFO:dataprep_ml-2861:Updating the mixers
/home/runner/work/lightwood/lightwood/lightwood/mixer/neural.py:335: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = GradScaler()
/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
INFO:lightwood-2861:Loss @ epoch 1: 0.10915952424208324
DEBUG:lightwood-2861: `adjust` runtime: 0.11 seconds

[6]:

new_predictions = predictor.predict(test_df)
new_predictions

INFO:dataprep_ml-2861:[Predict phase 1/4] - Data preprocessing
INFO:dataprep_ml-2861:Cleaning the data
DEBUG:lightwood-2861: `preprocess` runtime: 0.01 seconds
INFO:dataprep_ml-2861:[Predict phase 2/4] - Feature generation
INFO:dataprep_ml-2861:Featurizing the data
DEBUG:lightwood-2861: `featurize` runtime: 0.03 seconds
INFO:dataprep_ml-2861:[Predict phase 3/4] - Calling ensemble
DEBUG:lightwood-2861: `_timed_call` runtime: 0.03 seconds
INFO:dataprep_ml-2861:[Predict phase 4/4] - Analyzing output
INFO:lightwood-2861:The block ICP is now running its explain() method
INFO:lightwood-2861:The block ConfStats is now running its explain() method
INFO:lightwood-2861:ConfStats.explain() has not been implemented, no modifications will be done to the data insights.
INFO:lightwood-2861:The block AccStats is now running its explain() method
INFO:lightwood-2861:AccStats.explain() has not been implemented, no modifications will be done to the data insights.
INFO:lightwood-2861:The block PermutationFeatureImportance is now running its explain() method
INFO:lightwood-2861:PermutationFeatureImportance.explain() has not been implemented, no modifications will be done to the data insights.
DEBUG:lightwood-2861: `explain` runtime: 0.05 seconds
DEBUG:lightwood-2861: `predict` runtime: 0.13 seconds

[6]:

	original_index	prediction	confidence	lower	upper
0	0	43.645542	0.9991	0.000000	90.134073
1	1	26.964903	0.9991	0.000000	73.453434
2	2	24.151918	0.9991	0.000000	70.640449
3	3	20.815800	0.9991	0.000000	67.304330
4	4	34.987530	0.9991	0.000000	81.476060
...	...	...	...	...	...
201	201	52.630058	0.9991	6.141528	99.118589
202	202	39.175228	0.9991	0.000000	85.663759
203	203	33.047440	0.9991	0.000000	79.535970
204	204	28.659138	0.9991	0.000000	75.147668
205	205	34.264580	0.9991	0.000000	80.753111

206 rows × 5 columns

Nice! Our predictor was updated, and new predictions are looking good. Let’s compare the old and new accuracies to complete the experiment:

[7]:

from sklearn.metrics import r2_score
import numpy as np

old_acc = r2_score(test_df['concrete_strength'], predictions['prediction'])
new_acc = r2_score(test_df['concrete_strength'], new_predictions['prediction'])

print(f'Old Accuracy: {round(old_acc, 3)}\nNew Accuracy: {round(new_acc, 3)}')

Old Accuracy: 0.233
New Accuracy: 0.428

Conclusion

We have gone through a simple example of how Lightwood predictors can leverage newly acquired data to improve their predictions. The interface for doing so is fairly simple, requiring only some new data and a single call to update.

You can further customize the logic for updating your mixers by modifying the partial_fit() methods in them.