Inside TDgpt

Jeff Tao

March 26, 2025 / Engineering

Numerous algorithms have been proposed to perform time-series forecasting, anomaly detection, imputation, and classification, with varying technical characteristics suited for different scenarios.

Typically, these analysis algorithms are packaged as toolkits in high-level programming languages (such as Python or R) and are widely distributed and used through open-source channels. This model helps software developers integrate complex analysis algorithms into their systems and greatly lowers the barrier to using advanced algorithms.

Database system developers have also attempted to integrate data analysis algorithm models directly into database systems. By building machine learning libraries (e.g., Spark’s MLlib), they aim to leverage mature analytical techniques to enhance the advanced data analysis capabilities of databases or analytical computing engines.

The rapid development of artificial intelligence (AI) has brought new opportunities to time-series data analysis. Efficiently applying AI capabilities to this field also presents new possibilities for databases. To this end, TDengine has introduced TDgpt, an intelligent agent for time-series analytics. With TDgpt, you can use statistical analysis algorithms, machine learning models, deep learning models, foundational models for time-series data, and large language models via SQL statements. TDgpt exposes the analytical capabilities of these algorithms and models through SQL and applies them to your time-series data using new windows and functions.

Technical Features

TDgpt is an external advanced analytics system that integrates seamlessly with TDengine’s main process taosd. It allows time-series analysis services to be embedded directly into TDengine’s query execution flow.

TDgpt is a stateless platform for time-series analytics that includes the classic StatsModels library of statistical analysis models as well as embedded frameworks such as Torch and Keras for machine and deep learning. In addition, it can directly invoke TDengine’s proprietary foundation model TDtsfm through request forwarding and adaptation. As an analytics platform, TDgpt will also support integration with third-party time-series models in the future. By modifying just a single parameter (algo), you will be able to access cutting-edge time-series model services.

TDgpt is an open system to which you can easily add your own algorithms for forecasting, anomaly detection, imputation, and classification. Once added, the new algorithms can be used simply by changing the corresponding parameters in the SQL statement, with no need to modify a single line of application code.

System Architecture

TDgpt is composed of one or more stateless analysis nodes, called AI nodes (anodes). These anodes can be deployed as needed across the TDengine cluster in appropriate hardware environments (for example, on compute nodes equipped with GPUs) depending on the requirements of the algorithms being used.

TDgpt provides a unified interface and invocation method for different types of analysis algorithms. Based on user-specified parameters, it calls advanced algorithm packages and other analytical tools, then returns the results to TDengine’s main process (taosd) in a predefined format.

TDgpt consists of four main components:

Built-in analytics libraries: Includes libraries such as statsmodels, pyculiarity, and pmdarima, offering ready-to-use models for forecasting and anomaly detection.
Built-in machine learning libraries: Includes libraries like Torch, Keras, and Scikit-learn to run pre-trained machine and deep learning models within TDgpt’s process space. The training process can be managed using end-to-end open-source ML frameworks such as Merlion or Kats, and trained models can be deployed by uploading them to a designated TDgpt directory.
Request adapter for general-purpose LLMs: Converts time-series forecasting requests into prompts for general-purpose LLMs such as Llama in a MaaS manner. (Note: This functionality is not open source.)
Adapter for locally deployed time-series models: Sends requests directly to models like Time-MoE and TDtsfm that are specifically designed for time-series data. Compared to general-purpose LLMs, these models do not require prompt engineering, are lighter-weight, and are easier to deploy locally with lower hardware requirements. In addition, the adapter can also connect to cloud-based time-series MaaS systems such as TimeGPT, enabling localized analysis powered by cloud-hosted models.

During query execution, the vnode in TDengine forwards any elements involving advanced time-series data analytics directly to the anode. Once the analysis is completed, the results are assembled and embedded back into the query execution process.

Advanced Analytics Services

The services provided by TDgpt are described as follows:

Anomaly detection: This service is provided via a new anomaly window that has been introduced into TDengine. An anomaly window is a special type of event window, defined by the anomaly detection algorithm as a time window during which an anomaly is occurring. This window differs from an event window in that the algorithm determines when it opens and closes instead of expressions input by the user. The query operations supported by other windows are also supported for anomaly windows.
Time-series forecasting: The FORECAST function invokes a specified (or default) forecasting algorithm to predict future time-series data based on input historical data.
Data imputation: To be released in July 2025
Time-series classification: To be released in July 2025

Custom Algorithms

TDgpt is an extensible platform for advanced time-series data analytics. You can follow the steps described in this document to develop your own analytics algorithms and add them to the platform. Your applications can then use SQL statements to invoke these algorithms. It is not necessary to make updates to your applications.

Custom algorithms must be developed in Python. The anode adds algorithms dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:

Develop an analytics algorithm according to the TDgpt requirements.
Place the source code files in the appropriate directory and restart the anode.
Refresh the algorithm cache table.

You can then use your new algorithm in SQL statements.

Algorithm Evaluation

TDengine Enterprise includes a tool that evaluates the effectiveness of different algorithms and models. You can use this tool on any algorithm or model in TDgpt, including built-in and custom forecasting and anomaly detection algorithms and models. The tool uses quantitative metrics to evaluate the accuracy and performance of each algorithm with a given dataset in TDengine.

Model Management

Trained models for machine learning frameworks such as Torch, TensorFlow, and Keras must be placed in the designated directory on the anode. The anode automatically detects and loads models from this directory.

TDengine Enterprise includes a model manager that integrates seamlessly with open-source end-to-end ML frameworks for time-series data such as Merlion and Kats.

Jeff Tao
With over three decades of hands-on experience in software development, Jeff has had the privilege of spearheading numerous ventures and initiatives in the tech realm. His passion for open source, technology, and innovation has been the driving force behind his journey.

As one of the core developers of TDengine, he is deeply committed to pushing the boundaries of time series data platforms. His mission is crystal clear: to architect a high performance, scalable solution in this space and make it accessible, valuable and affordable for everyone, from individual developers and startups to industry giants.