Time Series Data Forecasting with TDengine

Shuduo Sang
Shuduo Sang
/
Share on LinkedIn

Time series data forecasting is of particular interest to users in many industries who want more insight from their existing data. And with the AI era fast approaching, new technologies such as machine learning and deep learning algorithms are becoming essential elements of modern data analytics. As one of the major use cases of a time series database is predicting trends, this article discusses how to use machine learning and deep learning algorithms to forecast the future with the data stored in a TDengine cluster.

In this article, an example is given of time series data forecasting based on existing data stored in TDengine. We will generate sample data for an electricity consumption scenario and then demonstrate how to use TDengine and a few Python libraries to forecast how the data will change over the next year.

Introduction

In this example, the user is an electricity provider that collects electricity consumption data from smart meters each day and stores it in a TDengine cluster. The goal is to predict how consumption will grow over the next year so that appropriate actions, such as the deployment of additional equipment, can be taken in advance to support this growth.

The sample data assumes that electricity consumption increases by a certain percentage each year as the economy grows, and reflects a city in the northern hemisphere, where more electricity is generally used in the summer months.

To view the source code for this example, see the GitHub repository.

Procedure

This procedure describes how to run the sample code:

  1. Deploy TDengine on your local machine and ensure that the TDengine server is running.
    Refer to the documentation for detailed instructions.
  2. Clone the source code for this demo to your local machine.
    git clone https://github.com/sangshuduo/td-forecasting
  3. Install the required Python packages. Note that Python 3.6 or later is required.
    python3 -m pip install -r requirements.txt
  4. Run the mockdata.py file to generate sample data.
    python3 mockdata.py
  5. Run the forecast.py file to forecast next year’s data.
    python3 forecast.py
    Note: If an error occurs during image generation, install the libxcb-xinerama0 package.

The following figure is displayed.

An increase is shown from 2015 to 2023, and a continued increase is forecasted from 2023 to 2024. Two methods are used to obtain the forecasted data, and the results are similar.

In the figure, the blue line indicates the sample data for years 2015 to 2023. The orange line is a prediction of the 2023-2024 data based on a linear regression model. The green line is a prediction of the same data based on a gradient boosting model.

How it works

Sample data generation

The mockdata.py file generates sample data for an electricity consumption scenario. In this scenario, it is assumed that consumption grows each year and is higher in the summer.

...
def insert_rec_per_month(conn, db_name, table_name, year, month):
    increment = (year - 2014) * 1.1
    base = int(10 * increment)
    if month < 10 and month > 5:
        factor = 10
    else:
        factor = 8
    for day in range(1, monthrange(year, month)[1] + 1):
        num = base * randint(5, factor) + randint(0, factor)
        sql = f"INSERT INTO {db_name}.{table_name} VALUES ('{year}-{month}-{day} 00:00:00.000', {num})"
        try:
            conn.execute(sql)
        except Exception as e:
            print(f"command: {sql}")
            print(e)
 ...

Time series data forecasting

The forecast.py file forecasts next year’s data based on the sample data generated. First, the necessary modules are imported:

import argparse

import lightgbm as lgb
import matplotlib.pyplot as plt
import mlforecast
import pandas as pd
from mlforecast.target_transforms import Differences
from sklearn.linear_model import LinearRegression
from sqlalchemy import create_engine, text
...

These modules are described as follows:

  • lightgbm is a Python module that supports the LightGBM algorithm, a gradient-boosting framework that uses tree-based learning algorithms.
  • Matplotlib is one of the most popular Python modules for visualization.
  • mlforecast is a framework to perform time series forecasting using machine learning models.
  • pandas is the most popular module to support data manipulation.
  • scikit-learn (sklearn) is a module that supports popular data science/machine learning algorithms.
  • SQLAlchemy is a Python SQL toolkit and object relational mapper that gives application developers the full power and flexibility of SQL.

Next, a connection to TDengine is established:

...
    engine = create_engine("taos://root:taosdata@localhost:6030/power")
    conn = engine.connect()
    print("Connected to the TDengine ...")
    df = pd.read_sql(
        text("select _wstart as ds, avg(num) as y from power.meters interval(1w)"), conn
    )
    conn.close()
...

This connection makes use of the TDengine Python connector together with SQLAlchemy to establish the connection and pandas to query data frames. We use the AVG() function and INTERVAL(1w) clause to query weekly averages from the TDengine cluster.

Now that we have generated our sample data and connected to TDengine, we can forecast future data:

...
    df.insert(0, column="unique_id", value="unique_id")

    print("Forecasting ...")
    forecast = mlforecast.MLForecast(
        models=[LinearRegression(), lgb.LGBMRegressor()],
        freq="W",
        lags=[52],
        target_transforms=[Differences([52])],
    )
    forecast.fit(df)

    predicts = forecast.predict(52)

    pd.concat([df, predicts]).set_index("ds").plot(figsize=(12, 8))
...

This uses the mlforecast module to implement forecasting through linear regression and through gradient boosting (LightGBM), and then combines the results into a single graph for comparison.

Finally, we display the results on screen or as an image file:

...
    if args.dump:
        plt.savefig(args.dump)
    else:
        plt.show()

You can run forecast.py with the --dump <filename> parameter to save your results to disk.

Conclusion

This article demonstrates a basic example of time series data forecasting in TDengine. With a short Python script, you can easily and quickly predict electricity consumption from historical data stored in TDengine.

  • Shuduo Sang
    Shuduo Sang

    Shuduo Sang is an open-source software veteran at TDengine, working on magnifying TDengine capabilities to connect easily with more third-party software for IoT and DevOps users. He and his team develop and maintain the multiple client libraries and tools for TDengine such as taosBenchmark and taosAdapter. Before joining TDengine, he worked on open-source and Linux at Canonical and Intel.