{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial - Time series forecasting\n", "\n", "## Introduction\n", "\n", "Time series are an ubiquitous type of data in all types of processes. Producing forecasts for them can be highly valuable in domains like retail or industrial manufacture, among many others.\n", "\n", "Lightwood supports time series forecasting (both univariate and multivariate inputs), handling many of the pain points commonly associated with setting up a manual time series predictive pipeline. \n", "\n", "In this tutorial, we will train a lightwood predictor and analyze its forecasts for the task of counting sunspots in monthly intervals.\n", "\n", "## Load data\n", "\n", "Let's begin by loading the dataset and looking at it:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:14.342783Z", "iopub.status.busy": "2024-05-07T17:15:14.342325Z", "iopub.status.idle": "2024-05-07T17:15:14.884075Z", "shell.execute_reply": "2024-05-07T17:15:14.883399Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MonthSunspots
01749-0158.0
11749-0262.6
21749-0370.0
31749-0455.7
41749-0585.0
.........
28151983-0871.8
28161983-0950.3
28171983-1055.8
28181983-1133.3
28191983-1233.4
\n", "

2820 rows × 2 columns

\n", "
" ], "text/plain": [ " Month Sunspots\n", "0 1749-01 58.0\n", "1 1749-02 62.6\n", "2 1749-03 70.0\n", "3 1749-04 55.7\n", "4 1749-05 85.0\n", "... ... ...\n", "2815 1983-08 71.8\n", "2816 1983-09 50.3\n", "2817 1983-10 55.8\n", "2818 1983-11 33.3\n", "2819 1983-12 33.4\n", "\n", "[2820 rows x 2 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv(\"https://raw.githubusercontent.com/mindsdb/benchmarks/main/benchmarks/datasets/monthly_sunspots/data.csv\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a very simple dataset. It's got a single column that specifies the month in which the measurement was done, and then in the 'Sunspots' column we have the actual quantity we are interested in forecasting. As such, we can characterize this as a univariate time series problem.\n", "\n", "## Define the predictive task\n", "\n", "We will use Lightwood high level methods to state what we want to predict. As this is a time series task (because we want to leverage the notion of time to predict), we need to specify a set of arguments that will activate Lightwood's time series pipeline:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:14.919664Z", "iopub.status.busy": "2024-05-07T17:15:14.919201Z", "iopub.status.idle": "2024-05-07T17:15:17.382878Z", "shell.execute_reply": "2024-05-07T17:15:17.382217Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:No torchvision detected, image helpers not supported.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:No torchvision/pillow detected, image encoder not supported\u001b[0m\n" ] } ], "source": [ "from lightwood.api.high_level import ProblemDefinition" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:17.385947Z", "iopub.status.busy": "2024-05-07T17:15:17.385553Z", "iopub.status.idle": "2024-05-07T17:15:17.389110Z", "shell.execute_reply": "2024-05-07T17:15:17.388504Z" } }, "outputs": [], "source": [ "tss = {'horizon': 6, # the predictor will learn to forecast what the next semester counts will look like (6 data points at monthly intervals -> 6 months)\n", " 'order_by': 'Month', # what column is used to order the entire datset\n", " 'window': 12 # how many past values to consider for emitting predictions\n", " }\n", "\n", "pdef = ProblemDefinition.from_dict({'target': 'Sunspots', # specify the column to forecast\n", " 'timeseries_settings': tss # pass along all time series specific parameters\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's do a very simple train-test split, leaving 10% of the data to check the forecasts that our predictor will produce:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:17.391428Z", "iopub.status.busy": "2024-05-07T17:15:17.391229Z", "iopub.status.idle": "2024-05-07T17:15:17.395522Z", "shell.execute_reply": "2024-05-07T17:15:17.394859Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2538, 2) (282, 2)\n" ] } ], "source": [ "cutoff = int(len(df)*0.9)\n", "\n", "train = df[:cutoff]\n", "test = df[cutoff:]\n", "\n", "print(train.shape, test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate the predictor object\n", "\n", "Now, we can generate code for a machine learning model by using our problem definition and the data:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:17.397741Z", "iopub.status.busy": "2024-05-07T17:15:17.397547Z", "iopub.status.idle": "2024-05-07T17:15:21.486719Z", "shell.execute_reply": "2024-05-07T17:15:21.486032Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:Analyzing a sample of 2467\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:from a total population of 2820, this is equivalent to 87.5% of your data.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:Infering type for: Month\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:Column Month has data type date\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:Infering type for: Sunspots\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:type_infer-2879:Column Sunspots has data type float\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Starting statistical analysis\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dataprep_ml/cleaners.py:163: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.\n", " result = pd.to_datetime(element,\n", "\u001b[32mINFO:dataprep_ml-2879:Finished statistical analysis\u001b[0m\n" ] } ], "source": [ "from lightwood.api.high_level import (\n", " json_ai_from_problem,\n", " code_from_json_ai,\n", " predictor_from_code\n", ")\n", "\n", "json_ai = json_ai_from_problem(df, problem_definition=pdef)\n", "code = code_from_json_ai(json_ai)\n", "predictor = predictor_from_code(code)\n", "\n", "# uncomment this to see the generated code:\n", "# print(code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train\n", "\n", "Okay, everything is ready now for our predictor to learn based on the training data we will provide.\n", "\n", "Internally, lightwood cleans and reshapes the data, featurizes measurements and timestamps, and comes up with a handful of different models that will be evaluated to keep the one that produces the best forecasts.\n", "\n", "Let's train the predictor. This should take a couple of minutes, at most:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:21.490176Z", "iopub.status.busy": "2024-05-07T17:15:21.489500Z", "iopub.status.idle": "2024-05-07T17:15:23.396695Z", "shell.execute_reply": "2024-05-07T17:15:23.396028Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 1/8] - Statistical analysis\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Starting statistical analysis\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dataprep_ml/cleaners.py:163: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.\n", " result = pd.to_datetime(element,\n", "\u001b[32mINFO:dataprep_ml-2879:Finished statistical analysis\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `analyze_data` runtime: 0.05 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 2/8] - Data preprocessing\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Cleaning the data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dataprep_ml/cleaners.py:163: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.\n", " result = pd.to_datetime(element,\n", "\u001b[32mINFO:dataprep_ml-2879:Transforming timeseries data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `preprocess` runtime: 0.09 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 3/8] - Data splitting\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Splitting the data into train/test\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `split` runtime: 0.0 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 4/8] - Preparing encoders\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:dataprep_ml-2879:Preparing sequentially...\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `prepare` runtime: 0.05 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 5/8] - Feature generation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Featurizing the data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `featurize` runtime: 0.05 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 6/8] - Mixer training\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Training the mixers\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:XGBoost running on CPU\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/torch/amp/grad_scaler.py:131: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n", "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n", "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n", "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n", "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n", "[17:15:21] WARNING: ../src/learner.cc:339: No visible GPU is found, setting `gpu_id` to -1\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pytorch_ranger/ranger.py:172: UserWarning: This overload of addcmul_ is deprecated:\n", "\taddcmul_(Number value, Tensor tensor1, Tensor tensor2)\n", "Consider using one of the following signatures instead:\n", "\taddcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1578.)\n", " exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)\n", "\u001b[32mINFO:lightwood-2879:Loss of 9.051180630922318 with learning rate 0.0001\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 9.014871209859848 with learning rate 0.0005\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 8.969509482383728 with learning rate 0.001\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 8.879052013158798 with learning rate 0.002\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 8.788950502872467 with learning rate 0.003\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 8.611965209245682 with learning rate 0.005\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 8.195775926113129 with learning rate 0.01\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss of 6.255893141031265 with learning rate 0.05\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Found learning rate of: 0.05\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 1: 0.5818348675966263\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 2: 0.4797109067440033\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 3: 0.48386093974113464\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 4: 0.49511992931365967\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 5: 0.39475560188293457\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 6: 0.39592696726322174\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 7: 0.3622782379388809\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 8: 0.38170479238033295\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 9: 0.5138543993234634\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 10: 0.6360723078250885\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 1: 0.29868809472430835\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 2: 0.30318967591632495\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `fit_mixer` runtime: 0.86 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting LGBM models for array prediction\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:42.76798\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.987008333206 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:42.76798\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:31.72661\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:24.49596\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:20.38592\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:18.09356\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:16.88080\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:16.21734\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:15.95640\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:15.80745\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:15.76428\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:15.89176\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:15.89176\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[12]\tvalidation_0-rmse:15.87901\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[13]\tvalidation_0-rmse:15.87505\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[14]\tvalidation_0-rmse:16.06330\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:42.95930\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.988311052322 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:42.95930\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:32.27936\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:25.47815\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:21.37610\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:19.25243\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:18.03199\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:17.67706\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:17.57516\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:17.51227\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:17.51216\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:17.55192\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:17.56609\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[12]\tvalidation_0-rmse:17.71702\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[13]\tvalidation_0-rmse:17.75939\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[14]\tvalidation_0-rmse:17.84796\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:43.14000\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.988361358643 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:43.14000\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:32.50446\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:25.73040\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:22.16599\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:20.28726\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:19.46406\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:19.07306\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:19.00714\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:19.13990\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:19.12589\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:19.34977\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:19.43217\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.19079\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.988099575043 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.19079\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:34.13289\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:27.40621\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:23.82532\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:22.03399\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:21.07010\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:20.74813\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:20.81255\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:20.69303\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:20.71044\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:20.79641\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:20.78759\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[12]\tvalidation_0-rmse:20.83998\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[13]\tvalidation_0-rmse:20.77980\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.24747\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.988680124283 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.24747\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:34.37446\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:27.88767\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:24.63817\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:22.84209\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:22.35045\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:22.11300\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:22.16132\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:22.21348\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:22.10747" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:22.20352\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:22.25761\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[12]\tvalidation_0-rmse:22.25308\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[13]\tvalidation_0-rmse:22.31415\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Started fitting XGBoost model\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.48913\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:A single GBM iteration takes 0.1 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Training XGBoost with 57023 iterations given 7127.988005399704 seconds constraint\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-rmse:44.48913\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1]\tvalidation_0-rmse:34.69001\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2]\tvalidation_0-rmse:28.87323\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[3]\tvalidation_0-rmse:25.32567\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[4]\tvalidation_0-rmse:23.09943\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[5]\tvalidation_0-rmse:22.12203\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[6]\tvalidation_0-rmse:21.71523\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[7]\tvalidation_0-rmse:21.70934\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[8]\tvalidation_0-rmse:21.74380\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[9]\tvalidation_0-rmse:21.61157\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[10]\tvalidation_0-rmse:21.73507\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[11]\tvalidation_0-rmse:21.84587\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[12]\tvalidation_0-rmse:21.78099\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[13]\tvalidation_0-rmse:21.68890\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[14]\tvalidation_0-rmse:21.70025\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `fit_mixer` runtime: 0.49 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Ensembling the mixer\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Mixer: NeuralTs got accuracy: 0.875\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:This model does not output probability estimates\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Mixer: XGBoostArrayMixer got accuracy: 0.869\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Picked best mixer: NeuralTs\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `fit` runtime: 1.4 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 7/8] - Ensemble analysis\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Analyzing the ensemble of mixers\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block ICP is now running its analyze() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block ConfStats is now running its analyze() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block AccStats is now running its analyze() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block PermutationFeatureImportance is now running its analyze() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33mWARNING:lightwood-2879:Block 'PermutationFeatureImportance' does not support time series nor text encoding, skipping...\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `analyze_ensemble` runtime: 0.16 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Learn phase 8/8] - Adjustment on validation requested\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Updating the mixers\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/torch/amp/grad_scaler.py:131: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.\n", " warnings.warn(\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 1: 0.29626286526521045\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Loss @ epoch 2: 0.2954987535874049\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:Updating array of LGBM models...\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:XGBoost mixer does not have a `partial_fit` implementation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `adjust` runtime: 0.09 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `learn` runtime: 1.9 seconds\u001b[0m\n" ] } ], "source": [ "predictor.learn(train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predict\n", "\n", "Once the predictor has trained, we can use it to generate 6-month forecasts for each of the test set data points:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:23.400033Z", "iopub.status.busy": "2024-05-07T17:15:23.399633Z", "iopub.status.idle": "2024-05-07T17:15:23.626063Z", "shell.execute_reply": "2024-05-07T17:15:23.625426Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Predict phase 1/4] - Data preprocessing\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/dcd5321def984b95f7ef126337f7cb3dd9c39250d1d319a417151021214801152.py:587: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " data[col] = [None] * len(data)\n", "\u001b[32mINFO:dataprep_ml-2879:Cleaning the data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/dataprep_ml/cleaners.py:163: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.\n", " result = pd.to_datetime(element,\n", "\u001b[32mINFO:dataprep_ml-2879:Transforming timeseries data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `preprocess` runtime: 0.02 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Predict phase 2/4] - Feature generation\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:Featurizing the data\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `featurize` runtime: 0.01 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Predict phase 3/4] - Calling ensemble\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `_timed_call` runtime: 0.09 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:dataprep_ml-2879:[Predict phase 4/4] - Analyzing output\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block ICP is now running its explain() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block ConfStats is now running its explain() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:ConfStats.explain() has not been implemented, no modifications will be done to the data insights.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block AccStats is now running its explain() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:AccStats.explain() has not been implemented, no modifications will be done to the data insights.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:The block PermutationFeatureImportance is now running its explain() method\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mINFO:lightwood-2879:PermutationFeatureImportance.explain() has not been implemented, no modifications will be done to the data insights.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `explain` runtime: 0.09 seconds\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[37mDEBUG:lightwood-2879: `predict` runtime: 0.22 seconds\u001b[0m\n" ] } ], "source": [ "forecasts = predictor.predict(test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check how a single row might look:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:23.628682Z", "iopub.status.busy": "2024-05-07T17:15:23.628360Z", "iopub.status.idle": "2024-05-07T17:15:23.639399Z", "shell.execute_reply": "2024-05-07T17:15:23.638764Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
original_indexpredictionorder_Monthconfidencelowerupperanomalyprediction_sumlower_sumupper_sumconfidence_mean
1010[50.51358374451768, 53.78975402923053, 51.0303...[-273628800.0, -270950400.0, -268272000.0, -26...[0.79, 0.02, 0.9991, 0.9991, 0.9991, 0.9991][30.14139795091352, 32.97332333016865, 0.0, 0....[70.88576953812185, 74.60618472829242, 137.289...False294.4940880.0209.0758650.801067
\n", "
" ], "text/plain": [ " original_index prediction \\\n", "10 10 [50.51358374451768, 53.78975402923053, 51.0303... \n", "\n", " order_Month \\\n", "10 [-273628800.0, -270950400.0, -268272000.0, -26... \n", "\n", " confidence \\\n", "10 [0.79, 0.02, 0.9991, 0.9991, 0.9991, 0.9991] \n", "\n", " lower \\\n", "10 [30.14139795091352, 32.97332333016865, 0.0, 0.... \n", "\n", " upper anomaly \\\n", "10 [70.88576953812185, 74.60618472829242, 137.289... False \n", "\n", " prediction_sum lower_sum upper_sum confidence_mean \n", "10 294.494088 0.0 209.075865 0.801067 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecasts.iloc[[10]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll note that the point `prediction` has associated `lower` and `upper` bounds that are a function of the estimated `confidence` the model has on its own output. Apart from this, `order_Month` yields the timestamps of each prediction, the `anomaly` tag will let you know if the observed value falls outside of the predicted region. \n", "\n", "\n", "## Visualizing a forecast\n", "\n", "Okay, time series are much easier to appreciate through plots. Let's make one:\n", "\n", "NOTE: We will use `matplotlib` to generate a simple plot of these forecasts. If you want to run this notebook locally, you will need to `pip install matplotlib` for the following code to work." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:23.641944Z", "iopub.status.busy": "2024-05-07T17:15:23.641597Z", "iopub.status.idle": "2024-05-07T17:15:24.035334Z", "shell.execute_reply": "2024-05-07T17:15:24.034722Z" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-05-07T17:15:24.038299Z", "iopub.status.busy": "2024-05-07T17:15:24.038005Z", "iopub.status.idle": "2024-05-07T17:15:24.208442Z", "shell.execute_reply": "2024-05-07T17:15:24.207775Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(12, 8))\n", "plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['prediction'], color='purple', label='point prediction')\n", "plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['lower'], color='grey')\n", "plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['upper'], color='grey')\n", "plt.xlabel('timestep')\n", "plt.ylabel('# sunspots')\n", "plt.title(\"Forecasted amount of sunspots for the next semester\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "In this tutorial, we have gone through how you can train a machine learning model with Lightwood to produce forecasts for a univariate time series task.\n", "\n", "There are additional parameters to further customize your timeseries settings and/or prediction insights, so be sure to check the rest of the documentation." ] } ], "metadata": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 4 }