`CREATE PREDICTOR` Statement¶

Description¶

The CREATE PREDICTOR statement creates and trains a new ML model.

Syntax¶

Here is the syntax:

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column]
[USING [parameter_key]=['parameter_value']];

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

Where:

Expressions	Description
`[predictor_name]`	Name of the model to be created.
`[integration_name]`	Name of the integration created using the `CREATE DATABASE` statement or file upload.
`(SELECT [column_name, ...] FROM [table_name])`	`SELECT` statement for selecting data to be used for training and validation.
`PREDICT [target_column]`	`target_column` is the column to be predicted.

Checking Model Status

After you run the CREATE PREDICTOR statement, you can check the status of the training process by querying the mindsdb.predictors table.

SELECT *
FROM mindsdb.predictors
WHERE name='[predictor_name]';

On execution, we get:

+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|name                    |status                             |accuracy                                |predict                |update_status|mindsdb_version|error|select_data_query|training_options|
+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|predictor_name          |generating or training or complete |number depending on the accuracy metric |column_to_be_predicted |up_to_date   |22.7.5.0       |     |                 |                |
+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+

Example¶

This example shows how to create and train a machine learning model called home_rentals_model and predict the rental prices for real estate properties inside the dataset.

CREATE PREDICTOR mindsdb.home_rentals_model
FROM db_integration 
    (SELECT * FROM house_rentals_data)
PREDICT rental_price;

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

To check the predictor status, query the mindsdb.predictors table.

SELECT *
FROM mindsdb.predictors
WHERE name='home_rentals_model';

On execution, we get:

+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| name               | status   | accuracy           | predict      | update_status | mindsdb_version | error | select_data_query | training_options |
+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| home_rentals_model | complete | 0.9991920992432087 | rental_price | up_to_date    | 22.5.1.0        | NULL  |                   |                  |
+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+

`CREATE PREDICTOR` with the `USING` Statement¶

Description¶

In MindsDB, the underlying AutoML models are based on the Lightwood engine by default. This library generates models automatically based on the data and declarative problem definition. But the default configuration can be overridden using the USING statement that provides an option to configure specific parameters of the training process.

In the upcoming version of MindsDB, it will be possible to choose another ML framework. Please note that the Lightwood engine is used by default.

Syntax¶

Here is the syntax:

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column]
USING [parameter_key] = ['parameter_value'];

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

`encoders` Key¶

It grants access to configure how each column is encoded. By default, the AutoML engine tries to get the best match for the data.

...
USING encoders.[column_name].module='value';

To learn more about encoders and their options, visit the Lightwood documentation page on encoders.

`model` Key¶

It allows you to specify the type of machine learning algorithm to learn from the encoder data.

...
USING model.args={"key": value};

Module options:

Module	Description
BaseMixer	Base class for all mixers.
LightGBM	This mixer configures and uses LightGBM for regression or classification tasks depending on the problem definition.
LightGBMArray	This mixer consists of several LightGBM mixers in regression mode aimed at time series forecasting tasks.
NHitsMixer	Wrapper around an MQN-HITS deep learning model.
Neural	The Neural mixer trains a fully connected dense network from concatenated encoded outputs of each feature in the dataset to predict the encoded output.
ProphetMixer	This mixer is a wrapper around the popular time series library sktime.
Regression	The Regression mixer inherits from scikit-learn’s Ridge class.
SkTime	This mixer is a wrapper around the popular time series library sktime.
Unit	Special mixer that passes along whatever prediction is made by the target encoder without modifications. It is used for single-column predictive scenarios that may involve complex and/or expensive encoders (e.g. free-form text classification with transformers).

To learn more about all the model options, visit the Lightwood documentation page.

Other Keys Supported by Lightwood in JsonAI¶

The most common use cases of configuring predictors use encoders and model keys explained above. To see all the available keys, check out the Lightwood documentation page on JsonAI.

Example¶

Here we use the home_rentals dataset and specify particular encoders for some columns and a LightGBM model.

CREATE PREDICTOR mindsdb.home_rentals_model
FROM db_integration
    (SELECT * FROM home_rentals)
PREDICT rental_price
USING
    encoders.location.module='CategoricalAutoEncoder',
    encoders.rental_price.module = 'NumericEncoder',
    encoders.rental_price.args.positive_domain = 'True',
    model.args={"submodels":[
                    {"module": "LightGBM",
                     "args": {
                          "stop_after": 12,
                          "fit_on_dev": true
                          }
                    }
                ]};

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

`CREATE PREDICTOR` From File¶

To create a predictor from a file, you should first upload a file to MindsDB. Follow this guide to see how to do that.

Description¶

This statement is used to create and train a model from a file or a database table.

Syntax¶

Here is the syntax:

CREATE PREDICTOR mindsdb.[predictor_name]
FROM files
    (SELECT * FROM [file_name])
PREDICT target_column;

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

Where:

Name	Description
`[predictor_name]`	Name of the model to be created.
`[file_name]`	Name of the file uploaded via the MindsDB editor.
`(SELECT * FROM [file_name])`	`SELECT` statement for selecting the data to be used for training and validation.
`target_column`	`target_column` is the column to be predicted.

Example¶

Here we uploaded the home_rentals dataset as a file.

CREATE PREDICTOR mindsdb.home_rentals_model
FROM files
    (SELECT * from home_rentals)
PREDICT rental_price;

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

`CREATE PREDICTOR` For Time Series Models¶

Description¶

To train a time series model, MindsDB provides additional statements.

Syntax¶

Here is the syntax:

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [sequential_column], [partition_column], [other_column], [target_column]
     FROM [table_name])
PREDICT [target_column]

ORDER BY [sequential_column]
GROUP BY [partition_column]

WINDOW [int]
HORIZON [int];

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

Where:

Expressions	Description
`ORDER BY [sequential_column]`	The column by which time series is ordered. It can be a date or anything that defines the sequence of events.
`GROUP BY [partition_column]`	It is optional. The column by which rows that make a partition are grouped. For example, if you want to forecast the inventory for all items in the store, you can partition the data by `product_id`, so each distinct `product_id` has its own time series.
`WINDOW [int]`	The number of rows to look back at when making a prediction. It comes after the rows are ordered by the column defined in `ORDER BY` and split into groups by the column(s) defined in `GROUP BY`. This could be interpreted as "Always use the previous 10 rows".
`HORIZON [int]`	It is optional. The number of future predictions (it is 1 by default).

Getting a Prediction from a Time Series Model

Due to the nature of time series forecasting, you need to use the JOIN statement to get results.

Example¶

Here is an example:

CREATE PREDICTOR mindsdb.inventory_model
FROM db_integration
    (SELECT * FROM inventory)
PREDICT units_in_inventory
ORDER BY date
GROUP BY product_id
WINDOW 20
HORIZON 7;

On execution, we get:

Query OK, 0 rows affected (x.xxx sec)

Now, to get the results, we use the JOIN statement.

SELECT im.product_id, im.date, im.units_in_inventory AS predicted_units_in_inventory
FROM db_integration.inventory AS i
JOIN mindsdb.inventory_model AS im
WHERE i.date > LATEST
LIMIT 10;

The data source table (db_integration.inventory) and the predictor table (mindsdb.inventory_model) are joined to let us fetch the predictions for future dates.

CREATE PREDICTOR Statement¶

Description¶

Syntax¶

Example¶

CREATE PREDICTOR with the USING Statement¶

Description¶

Syntax¶

encoders Key¶

model Key¶

Other Keys Supported by Lightwood in JsonAI¶

Example¶

CREATE PREDICTOR From File¶

Description¶

Syntax¶

Example¶

CREATE PREDICTOR For Time Series Models¶

Description¶

Syntax¶

Example¶

`CREATE PREDICTOR` Statement¶

`CREATE PREDICTOR` with the `USING` Statement¶

`encoders` Key¶

`model` Key¶

`CREATE PREDICTOR` From File¶

`CREATE PREDICTOR` For Time Series Models¶