Scalable and Efficient Storage for Industrial Data

As new technologies like the Industrial IoT are implemented across enterprises, more data is being collected than ever before. The size of modern industrial datasets demands a data historian that can store them in a scalable and efficient manner. This article discusses two key ways in which TDengine meets the needs of industrial data storage.

Scalability

TDengine is a cloud native platform that was designed from the ground up to be elastic and scalable. It features a fully distributed architecture in which tasks are divided among the nodes in a cluster.

The figures shows a TDengine cluster, in which each node is virtualized into components for compute, storage, and metadata management. Because its compute and storage resources are separate, you can scale out incrementally to meet new business requirements, instead of in large, expensive blocks. In addition, faults that may occur on compute nodes do not affect storage, and vice versa.

TDengine also provides enhanced scalability by resolving the issue of high cardinality. The metadata for each data collection point is stored on the storage node instead of a centralized location so that read and write requests can be sent directly to the appropriate node. Requests for aggregation across data collection points are sent to the corresponding storage nodes; these nodes send their results to the compute node, which performs the aggregation operation. In this way the performance of TDengine does not deteriorate even as the cardinality of a dataset increases.

Efficient Storage

The TDengine industrial data platform is centered around a high performance time series database. TSBS benchmark testing shows that this database reads, writes, and stores IoT data more efficiently than competing products.

The figures show that TDengine has much less latency for data retrieval than InfluxDB and TimescaleDB, and for larger datasets uses only a fraction of the disk space due to its efficient compression mechanism. To learn more, see IoT Performance Comparison: InfluxDB and TimescaleDB vs. TDengine.

This high performance is possible because TDengine makes use of a unique data model that is optimized for time series data — exactly the type of data being generated by the sensors and data collectors in your plant or factory. In this data model, one table is created for every data collection point. This makes data ingestion more efficient by eliminating locks and implementing writes as append operations and also speeds up data retrieval because the data from each device is stored contiguously.

Although the number of tables is much higher than in other databases, TDengine uses the supertable concept to make the management of these tables easy and efficient. A supertable is a template used to create tables for a device type, and defines metrics and tags that are used for all devices of that type. You can query supertables to obtain aggregated data without costly join operations and filter by tag to drill down into subsets of your devices.

By building this data model into the system, TDengine eliminates the need for enterprises to devise their own data models, which can be a time-consuming and expensive process. Instead, TDengine takes advantage of the characteristics of time series data to provide the optimal model for performance and scalability.

Tiered Storage

The TDengine data model also enables tiered storage because it partitions data by time. You can easily define storage policies that keep the newest data on the fastest storage media while placing rarely accessed historical data in low-cost options.

As shown in the figure, the latest data is stored in memory with TDengine’s caching functionality while older data is stored on local disks or even in cloud storage. This method of storing data significantly reduces costs while ensuring that all data is accessible when needed.