Accelerating Time-Series Application Development with TDengine

Chait Diwadkar

October 9, 2024 / Engineering

TDengine is a comprehensive time-series platform, specifically designed for the needs of modern IoT and industrial applications which ingest data at varied frequencies depending on the application and often, with very high cardinality.

While performance, scalability and efficient storage with lossless compression, are of course important in any platform, features that enable developers to focus on the business requirements are equally important. There are several features that make TDengine, a comprehensive time-series platform which allows developers to focus on the needs and wants of their stakeholders, rather than having to spend time developing low-level features.

In this article, we will not discuss performance and scalability since those have been discussed elsewhere and notably in the TSBS performance benchmarks that TDengine has published. This blog discusses the results and also links to the full report for the IoT Benchmark.

Stages in Time-Series Application Development

There are 5 stages in the development of a general time-series application. By “general time-series application”, we mean applications that ingest time-series data, perhaps perform ETL on the data, store this data in a time-series database, allow easy analysis of this data using a ubiquitous query language as well as allowing easy access to the data by providing APIs in various languages, and finally, integration with visualization and analytics platforms.

The stages of developing a general time-series application are as follows:

Understanding the data and configuring the schema
Data Ingestion
1. Configuring data sources
2. Transforming the data, if needed
Pre-computing data automatically for analysis and visualization
Sharing data with stakeholders, including applications, following the least-privilege principle
Analyzing and visualizing data
Rapid Prototyping of a Time-Series Application

In this article we will show you how TDengine’s comprehensive time-series platform, simplifies each stage with features that allow developers to focus on the business problem they are trying to address, rather than spend time developing low-level code. Particularly, rapid prototyping is very simple with a stack that just uses TDengine, Python or NodeJS, and Grafana for real-time visualization. Note that since TDengine uses standard SQL as its query language, there is no need to learn a proprietary query language.

Novel concepts in TDengine

An important idea to keep in mind with TDengine are the two novel concepts that are part and parcel of TDengine.

One table per data collection point (DCP)
TDengine was designed from scratch for low latency ingestion and query performance even with high-cardinality. With this in mind, the data for each DCP is stored in its own table.
Supertables and subtables
A supertable is a template for a type of data collection point with a shared schema for metrics and tags for all DCPs of that type. As an example, inlet and outlet pumps for a chemical reactor could have one supertable. For a portfolio of combined-cycle power plants, gas turbines would have their own supertable. The heat recovery steam generators would have their own supertable and the steam turbines would also have their own supertable. Asset hierarchies and other contextual information would be preserved through tags. In the renewable energy space, in a solar plant, solar panels would have their own supertables as would inverters and transformers.

You can read more about the above concepts in our documentation.

Platform and Stages

The following diagram illustrates the stages mentioned in the previous section and how the TDengine platform fits into those stages.

Understanding the data and configuring the schema

Time-series data is generated by various devices and at various frequencies, and in a large number of cases and increasingly, this data is centralized for dissemination into an MQTT or OPC-UA broker. Many PLCs now have OPC-UA brokers built into them.

TDengine has built-in connectors for several software platforms and industrial protocols. In this article, we will look at MQTT in particular, since TDengine can automatically generate schema from an MQTT message. TDengine can also automatically generate schema in the case where OPC-UA is being used or if data is being ingested from an Aveva PI system.

Here are some best practices for schema creation:

Define the data collection points (DCP). A DCP could be a component of a larger system.
Each type of DCP should have its own supertable.
Depending on the number of metrics, you may need to split the DCP into more than one supertable. But this is unlikely unless you have thousands of metrics coming from a single DCP.
Define and use tags for static data (e.g. Serial numbers, locations) to ensure that context, such as asset hierarchies, can be preserved.
It is preferable to create separate databases for data that varies in frequency or for data that needs to have different retention policies. For e.g. create a separate database with separate supertables for KHz data versus Hz data.

Data Ingestion

In TDengine there are several ways to ingest data. As mentioned above, if you have data sources that use MQTT, OPC-UA, OPC-DA, Aveva PI or relational databases, TDengine has built-in connectors for them.

If you have proprietary data sources, TDengine provides APIs in several languages that you can use to ingest data and TDengine also supports InfluxDB line protocol.

Configuring Data Sources

TDengine provides intuitive user interfaces to configure data sources:

Transforming data (if necessary)

In the MQTT data source configuration, TDengine provides the ability to extract, filter and transform data.

Very complex JSON can be parsed as well. Here is an example:

{
    <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"deduplicationId": "eee81502-9fda-49c8-833a-a2f1503ae006",</mark>
    <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"time": "2024-03-15T16:43:24.459694+00:00",</mark>
    "deviceInfo": {
        <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"tenantId": "52f14cd4-c6f1-4fbd-8f87-4025e1d49242",</mark>
        "tenantName": "ChirpStack",
        "applicationId": "6efbc976-b30d-4979-a12e-b2cb3a400d48",
        "applicationName": "realDemo",
        "deviceProfileId": "b7291f78-c44f-44f9-a85a-79755b2fdb77",
        "deviceProfileName": "generic device",
        "deviceName": "VegaAir-0-8c",
        "devEui": "e8e8b7000040a08c",
        "deviceClassEnabled": "CLASS_A",
        "tags": {
            <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"decoder": "VegaAir",</mark>
            "tagPath": "",
            "tagProvider": "demo"
        }
    },
    <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"devAddr": "0076c34a",</mark>
    "adr": true,
    "dr": 3,
    "fCnt": 8400,
    "fPort": 1,
    "confirmed": false,
    "data": "AgA/zzjWLU8A6rM=",
    "rxInfo": [
        {
        "gatewayId": "00800000a000e239",
        <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"uplinkId": 25195,</mark>
        "nsTime": "2024-03-15T16:43:24.128133100+00:00",
        "rssi": -18,
        "snr": 11.5,
        "channel": 6,
        "rfChain": 1,
        "location": { },
        "context": "XAnvdA==",
        "metadata": {
            <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"region_common_name": "US915",</mark>
            "region_config_id": "us915_0"
        },
        "crcStatus": "CRC_OK"
    }
    ],
    "txInfo": {
        <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"frequency": 903500000,</mark>
        "modulation": {
            "lora": {
                <mark style="background-color:rgba(0, 0, 0, 0);color:#cc1117" class="has-inline-color">"bandwidth": 125000,</mark>
                "spreadingFactor": 7,
                "codeRate": "CR_4_5"
            }
        }
    }
}

To parse out only the parts in red, you can use a parse expression as follows:

deduplicationId,time,$.deviceInfo.tenantId,$.deviceInfo.tags.decoder,devAddr,$.rxInfo[0].uplinkId=uplinkId_alias,$.rxInfo[0].metadata.region_common_name=region_common_name,$.txInfo.frequency,$.txInfo.modulation.lora.bandwidth

In the OPC-UA data source configuration UI, you can perform numerical transformations in a configuration file in CSV format.

Of course, if you are using any of the TDengine APIs to ingest data, you can perform any extraction and transformation in the application before writing to TDengine.

By having off-the-shelf connectors and automating schema generation, TDengine allows developers to focus on the problems they are trying to solve.

Pre-computing data for analysis and visualization

TDengine provides stream processing functionality that allows data to be pre-computed. This reduces latency when querying and also reduces ad hoc queries that will take up system resources.

One of the ways to use streams is simply to perform aggregation, but streams are very powerful because they support a wide variety of SQL functions and very complex queries can be performed to pre-compute data for your business needs. In TDengine, a stream always writes results to a separate supertable. You can then point your dashboards or analytics tools to the results supertable instead of the raw or high-resolution data.

To learn how stream processing is used and see a real-world application, check out our tutorial.

For simpler aggregation queries on large datasets, TDengine also provides time-range small material aggregates (TSMA). These perform specified aggregate functions using fixed time windows and store the pre-computed results. Queries can retrieve the pre-computed results for better query performance.

To learn more about TSMA, see our documentation.

With steam processing and TSMA, developers can exploit the efficient compute and storage engine of TDengine to pre-process and aggregate data, to give their stakeholders near real-time information with very low latency. Again, this allows developers to focus on solving business issues and providing critical operational information to their stakeholders without having to worry about how to efficiently process large datasets.

Sharing data

TDengine supports role-based access control (RBCA) and there is no limit on the number of users that you can create. RBCA supports granularity at the supertable level.

To share data at a very granular level, you can use the publish/subscribe functionality provided by TDengine, which is simply called subscriptions in TDengine.

To use subscriptions, you create a topic in TDengine using SQL. A variety of functions are supported. Since topics are created using SQL, a high level of granularity can be achieved. The topic can be shared with anyone simply by sending them an email through TDengine. Of course, the topic then needs to be consumed by an application using TDengine APIs.

In the video referenced in the previous section, you can see how an alerting application is build using subscriptions and streams.

Developers can take advantage of streams and subscriptions, to build very powerful applications that can provide critical notifications to their stakeholders.

Analyzing and visualizing data

Lastly, and perhaps most importantly, being able to analyze and visualize data easily is critical to any enterprise for any kind of data and certainly for relatively high-frequency time-series data.

TDengine uses SQL as its query language, supports many functions and also has extensions for time-series such as partitioned queries using various types of windows. You can read more about the latter in our documentation.

TDengine has ODBC and JDBC drivers as well as REST APIs that allow easy integration between tools that support these methods to connect to databases.

For rapid prototyping, one of the easiest tools to use is Grafana, since TDengine has a plugin for Grafana that allows very easy visualization of time-series data. By using Grafana, you can also test out your dashboards and show them to your stakeholders quickly and get quick feedback.

For a demonstration of how to connect Grafana and set up dashboards using a TDengine backend, refer to this tutorial.

For analytics, TDengine supports tools such as Power BI using just the TDengine ODBC driver. You can see how to connect Power BI to TDengine and start analyzing data in this tutorial.

If you are using machine learning and want to test out TDengine being part of your pipeline, TDengine can fetch data from the database directly into dataframes. So it’s very easy to replace an existing data source with TDengine.

TDengine allows developers and analysts to focus on analytics and visualization, not only by allowing easy access to the data but also by moving the compute out of the analytics tool into the database engine.

Conclusion

As a comprehensive time-series platform, TDengine makes it easy for full-stack developers, or application development teams, to rapidly prototype time-series applications. It does this by providing comprehensive tools at every stage of the development of a time-series application which allows the developers to focus on solving the business issues that are critical to the efficiency and growth of the business.

Starting with connectors for ubiquitous industrial protocols like MQTT, OPC-UA to set up data ingestion quickly, stream processing and subscriptions to automate aggregation and alerting/notifications to quick visualization and analytics, developers can get a prototype running in just a few person-hours. Developers can then get quick feedback to iterate the prototype and help stakeholders build more accurate requirements for a full-fledged application.

Chait Diwadkar
Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.