Skip to content

Predicting Customer Churn with MindsDB

Introduction

In this tutorial, we'll create and train a machine learning model, or as we call it, an AI Table or a predictor. By querying the model, we'll predict the probability of churn for new customers of a telecom company.

Make sure you have access to a working MindsDB installation, either locally or at MindsDB Cloud.

If you want to learn how to set up your account at MindsDB Cloud, follow this guide. Another way is to set up MindsDB locally using Docker or Python.

Let's get started.

Data Setup

Connecting the Data

There are a couple of ways you can get the data to follow through with this tutorial.

You can connect to a demo database that we've prepared for you. It contains the data used throughout this tutorial (the example_db.demo_data.customer_churn table).

CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "3.220.66.106",
    "port": "5432",
    "database": "demo"
};

Now you can run queries directly on the demo database. Let's preview the data that we'll use to train our predictor.

SELECT * 
FROM example_db.demo_data.customer_churn
LIMIT 10;

You can download the CSV data file here and upload it via MindsDB SQL Editor.

Follow this guide to find out how to upload a file to MindsDB.

Now you can run queries directly on the file as if it were a table. Let's preview the data that we'll use to train our predictor.

SELECT *
FROM files.churn
LIMIT 10;

Pay Attention to the Queries

From now on, we'll use the files.churn file as a table. Make sure you replace it with example_db.demo_data.customer_churn if you connect the data as a database.

Understanding the Data

We use the customer churn dataset, where each row is one customer, to predict whether the customer is going to stop using the company products.

Below is the sample data stored in the files.churn table.

+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+
|customerID|gender|SeniorCitizen|Partner|Dependents|tenure|PhoneService|MultipleLines   |InternetService|OnlineSecurity|OnlineBackup|DeviceProtection|TechSupport|StreamingTV|StreamingMovies|Contract      |PaperlessBilling|PaymentMethod            |MonthlyCharges|TotalCharges|Churn|
+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+
|7590-VHVEG|Female|0            |Yes    |No        |1     |No          |No phone service|DSL            |No            |Yes         |No              |No         |No         |No             |Month-to-month|Yes             |Electronic check         |29.85         |29.85       |No   |
|5575-GNVDE|Male  |0            |No     |No        |34    |Yes         |No              |DSL            |Yes           |No          |Yes             |No         |No         |No             |One year      |No              |Mailed check             |56.95         |1889.5      |No   |
|3668-QPYBK|Male  |0            |No     |No        |2     |Yes         |No              |DSL            |Yes           |Yes         |No              |No         |No         |No             |Month-to-month|Yes             |Mailed check             |53.85         |108.15      |Yes  |
|7795-CFOCW|Male  |0            |No     |No        |45    |No          |No phone service|DSL            |Yes           |No          |Yes             |Yes        |No         |No             |One year      |No              |Bank transfer (automatic)|42.3          |1840.75     |No   |
|9237-HQITU|Female|0            |No     |No        |2     |Yes         |No              |Fiber optic    |No            |No          |No              |No         |No         |No             |Month-to-month|Yes             |Electronic check         |70.7          |151.65      |Yes  |
+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+

Where:

Column Description Data Type Usage
CustomerId The identification number of a customer. character varying Feature
Gender The gender of a customer. character varying Feature
SeniorCitizen It indicates whether the customer is a senior citizen (1) or not (0). integer Feature
Partner It indicates whether the customer has a partner (Yes) or not (No). character varying Feature
Dependents It indicates whether the customer has dependents (Yes) or not (No). character varying Feature
Tenure Number of months the customer has been staying with the company. integer Feature
PhoneService It indicates whether the customer has a phone service (Yes) or not (No). character varying Feature
MultipleLines It indicates whether the customer has multiple lines (Yes) or not (No, No phone service). character varying Feature
InternetService Customer’s internet service provider (DSL, Fiber optic, No). character varying Feature
OnlineSecurity It indicates whether the customer has online security (Yes) or not (No, No internet service). character varying Feature
OnlineBackup It indicates whether the customer has online backup (Yes) or not (No, No internet service). character varying Feature
DeviceProtection It indicates whether the customer has device protection (Yes) or not (No, No internet service). character varying Feature
TechSupport It indicates whether the customer has tech support (Yes) or not (No, No internet service). character varying Feature
StreamingTv It indicates whether the customer has streaming TV (Yes) or not (No, No internet service). character varying Feature
StreamingMovies It indicates whether the customer has streaming movies (Yes) or not (No, No internet service). character varying Feature
Contract The contract term of the customer (Month-to-month, One year, Two year). character varying Feature
PaperlessBilling It indicates whether the customer has paperless billinig (Yes) or not (No). character varying Feature
PaymentMethod Customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)). character varying Feature
MonthlyCharges The monthly charge amount. money Feature
TotalCharges The total amount charged to the customer. money Feature
Churn It indicates whether the customer churned (Yes) or not (No). character varying Label

Labels and Features

A label is a column whose values will be predicted (the y variable in simple linear regression).
A feature is a column used to train the model (the x variable in simple linear regression).

Training a Predictor

Let's create and train the machine learning model. For that, we use the CREATE PREDICTOR statement and specify the input columns used to train FROM (features) and what we want to PREDICT (labels).

CREATE PREDICTOR mindsdb.customer_churn_predictor
FROM files
  (SELECT * FROM churn)
PREDICT Churn;

We use all of the columns as features, except for the Churn column, whose values will be predicted.

Status of a Predictor

A predictor may take a couple of minutes for the training to complete. You can monitor the status of the predictor by using this SQL command:

SELECT status
FROM mindsdb.predictors
WHERE name='customer_churn_predictor';

If we run it right after creating a predictor, we get this output:

+------------+
| status     |
+------------+
| generating |
+------------+

A bit later, this is the output:

+----------+
| status   |
+----------+
| training |
+----------+

And at last, this should be the output:

+----------+
| status   |
+----------+
| complete |
+----------+

Now, if the status of our predictor says complete, we can start making predictions!

Making Predictions

Making a Single Prediction

You can make predictions by querying the predictor as if it were a table. The SELECT statement lets you make predictions for the label based on the chosen features.

SELECT Churn, Churn_confidence, Churn_explain
FROM mindsdb.customer_churn_predictor
WHERE SeniorCitizen=0 
AND Partner='Yes' 
AND Dependents='No' 
AND tenure=1 
AND PhoneService='No' 
AND MultipleLines='No phone service' 
AND InternetService='DSL';

On execution, we get:

+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Churn | Churn_confidence    | Churn_explain                                                                                                                                                    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Yes   | 0.7752808988764045  | {"predicted_value": "Yes", "confidence": 0.7752808988764045, "anomaly": null, "truth": null, "probability_class_No": 0.4756, "probability_class_Yes": 0.5244}    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
To get more accurate predictions, we should provide as much data as possible in the WHERE clause. Let's run another query.

SELECT Churn, Churn_confidence, Churn_explain
FROM mindsdb.customer_churn_predictor
WHERE SeniorCitizen=0 
AND Partner='Yes' 
AND Dependents='No' 
AND tenure=1 
AND PhoneService='No' 
AND MultipleLines='No phone service' 
AND InternetService='DSL'
AND Contract='Month-to-month'
AND MonthlyCharges=29.85
AND TotalCharges=29.85
AND OnlineBackup='Yes'
AND OnlineSecurity='No' 
AND DeviceProtection='No' 
AND TechSupport='No' 
AND StreamingTV='No' 
AND StreamingMovies='No' 
AND PaperlessBilling='Yes' 
AND PaymentMethod='Electronic check';

On execution, we get:

+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Churn | Churn_confidence    | Churn_explain                                                                                                                                                    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Yes   | 0.8202247191011236  | {"predicted_value": "Yes", "confidence": 0.8202247191011236, "anomaly": null, "truth": null, "probability_class_No": 0.4098, "probability_class_Yes": 0.5902}    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+

MindsDB predicted the probability of this customer churning with confidence of around 82%. The previous query predicted it with confidence of around 79%. So providing more data improved the confidence level of predictions.

Making Batch Predictions

Also, you can make bulk predictions by joining a data table with your predictor using JOIN.

SELECT t.customerID, t.Contract, t.MonthlyCharges, m.Churn 
FROM files.churn AS t 
JOIN mindsdb.customer_churn_predictor AS m
LIMIT 100;

On execution, we get:

+----------------+-------------------+------------------+---------+
| customerID     | Contract          | MonthlyCharges   | Churn   |
+----------------+-------------------+------------------+---------+
| 7590-VHVEG     | Month-to-month    | 29.85            | Yes     |
| 5575-GNVDE     | One year          | 56.95            | No      |
| 3668-QPYBK     | Month-to-month    | 53.85            | Yes     |
| 7795-CFOCW     | One year          | 42.3             | No      |
| 9237-HQITU     | Month-to-month    | 70.7             | Yes     |
+----------------+-------------------+------------------+---------+

What's Next?

Have fun while trying it out yourself!

If this tutorial was helpful, please give us a GitHub star here.