TDengine Concepts: Data Model

Jeff Tao

June 8, 2023 / Engineering

Although TDengine is a time-series database (TSDB), it uses a data model similar to those with which you may be familiar from relational databases. However, the TDengine model is purpose-built to take advantage of the characteristics of time-series data and avoid the scalability issues often encountered by other TSDBs.

Overview

In your TDengine deployment, you create one or more databases. The database defines the storage policy for the data it contains. This storage policy includes parameters such as block size and retention time. Within each database, you create tables to store your data. Each table consists of a set of rows and columns. Each row represents a data record, and each column represents an attribute.

In TDengine, the first column and primary key of every table is a timestamp. The subsequent columns include one or more metrics and one or more tags.

A metric is a measurement taken by a data collection point at a certain time. For example, a temperature reading taken by a device is considered a metric.
A tag is a static property of a data collection point. For example, the group ID of a device is considered a tag.

One Table per Data Collection Point

In TDengine, you create one table for each data collection point (DCP). A DCP is a hardware or software object that collects metrics based on preset time periods or triggered by events. One DCP can collect multiple metrics provided that they are collected at the same time and have the same timestamp. One physical device can have multiple DCPs that collect metrics at different intervals.

Creating one table for each DCP enables improved performance for time-series data for the following reasons:

Because each table is written to by only one object, locks are not encountered.
Because each DCP typically writes data in the order that the data is generated, writes can be implemented as append operations.
Because the data from each DCP is stored contiguously, querying data over a specific time period for a device does not involve random read operations.
Because the data for consecutive timestamps for a single DCP is typically similar, high compression rates can be achieved.

However, this model requires the creation of a large number of tables. In an environment with 1 million DCPs, your TDengine database would contain 1 million tables.

Supertables

To manage this large number of tables, TDengine uses a unique concept known as the supertable. A supertable is a template for a type of DCP that defines a shared schema of metrics and tags for all DCPs of that type. You create one supertable for each type of DCP that uses the same schema. Within the supertable, you then create one table for each DCP. The tables within a supertable are known as subtables.

Each DCP writes data to its own subtable, which inherits the schema of the supertable. The schema for a specific subtable cannot be changed.

When you query data in TDengine, you can query the data for a specific DCP by querying its subtable or query the aggregate data of multiple DCPs of a certain type by querying their supertable.

An example of the supertable and subtable structure is shown in the following figure.

Metadata

Metadata storage is distributed among nodes in a TDengine cluster instead of being centralized on a single node. When an application wants to aggregate the data from multiple tables, TDengine sends the filtering conditions to all nodes simultaneously. Each node then works in parallel to find the requested tables, aggregate the data, and finally send the results back to the query node or driver where the merge operation is performed.

The distributed design of TDengine now guarantees latency for label filtering operations provided that system resources are sufficient, and metadata is no longer a bottleneck. As the number of tables in a deployment increases, TDengine can simply allocate more resources and create more nodes to ensure the scalability of the system.

Benefits

The TDengine data model offers the following benefits:

High ingestion performance because data is typically written as append operations.
High query performance because related data is ordered properly.
High compression performance because similar data is stored contiguously.
Easier setup and usage because the optimal model for time-series data is built in to the system.

Jeff Tao
With over three decades of hands-on experience in software development, Jeff has had the privilege of spearheading numerous ventures and initiatives in the tech realm. His passion for open source, technology, and innovation has been the driving force behind his journey.

As one of the core developers of TDengine, he is deeply committed to pushing the boundaries of time series data platforms. His mission is crystal clear: to architect a high performance, scalable solution in this space and make it accessible, valuable and affordable for everyone, from individual developers and startups to industry giants.