Simplified Time-Series Data Solution


A time-series database (TSDB) stores and analyzes time-series data in a highly efficient way. However, time-series data processing requires not just storage and analytics, but a full-featured time-series data solution. In a typical time-series data solution, the time-series database is combined with third-party components for stream processing, caching, and data subscription, and other features as needed.

Because the overall solution has dependencies on these third-party components, complexity of design and difficulty of maintenance become pain points for time-series data processing. In addition, this kind of design requires additional compute and storage resources.

To simplify system design and reduce operation costs, TDengine integrates its time-series database with built-in caching, stream processing, and data subscription components that take full advantage of the characteristics of time-series data. This integrated design goes beyond efficient storage and analysis of time-series data to provide a comprehensive and simplified solution for time-series data processing.

Stream processing

For you to gain insight into your operations and detect errors faster, you need to analyze time-series data points as soon as they arrive at the system. Stream processing is therefore a natural fit for time-series data. Stream processing can be time-driven, producing new results at set intervals (known as continuous query), or data-driven, producing new results whenever a new data point arrives.

Many time-series databases provide a solution for continuous query. Continuous query works well in certain cases, for example in downsampling or in precalculating specific types of expensive queries. However, there are many other situations in which continuous query is intrinsically limited: preprocessing and transformation in scalar functions, session windows, and low-latency use cases like fault detection are all examples of scenarios where continuous query is not up to the task. In these scenarios, tools like Spark and Flink are added into the time-series data solution to provide stream processing. Unfortunately, this results in a complicated system design.

In the TDengine stream processing engine, SQL statements, including user-defined functions, are translated into the pipelines of stream operators. Data is then automatically processed on ingestion and results are output in real time, at specified time periods, or at a specified watermark. This engine makes TDengine the only time-series data solution that supports both time-driven and event-driven stream processing out of the box.

TDengine stream processing provides millisecond-level latency while processing high-throughput events, enabling real time alerting, data transformation, and preprocessing. With TDengine, you can perform aggregations and gain insight into your data fast enough to support real-time dashboards and real-time data analysis. In addition, TDengine can handle out-of-order data by means of user-specified watermarks or automated retrieval of data from the storage engine on demand.

TDengine time series database | 21.04.06 01 stream

Stream processing in TDengine is intuitive and easy to use, requiring only a few SQL statements. For more information, see Stream Processing.

Caching

IoT and IIoT scenarios demand that the system return the latest data to the application as soon as possible. For example, a fleet management system always needs to know the current GPS position of each truck in the fleet. And in a smart factory, the system always needs to know the current state of every valve and the current reading of every meter.

A typical time-series data solution solves this problem by writing new data points into the time-series database and into Redis at the same time. The application then retrieves the newest data points from Redis instead of the time-series database. While this design works, it necessitates increased system complexity and higher cost of operation.

In TDengine, each vnode is allocated a fixed amount of memory for caching data points. TDengine manages its cache with a first in, first out (FIFO) policy instead of the least recently used (LRU) policy adopted in some systems. This is because it is most important for time-series data applications to have fast access to the newest data in the system.

By caching the newest data, TDengine can retrieve it in milliseconds, eliminating the need to deploy a separate caching system. TDengine also provides the LAST_ROW function, an extension to standard SQL that retrieves the newest data points. With TDengine, you have access to a simple and straightforward built-in caching system to ensure that your applications get the data they need when they need it. For more information, see Caching.

Data subscription

The message queue plays an important role in many system architectures. Incoming data points are first written into a message queue and then consumed by other components in the system.

In TDengine, all incoming data points are first written into the write-ahead log (WAL) in append mode. Generally speaking, as the purpose of the WAL is to recover data in the event of a system crash, this WAL file is normally removed once the corresponding data in memory is persisted to the database.

TDengine takes a novel approach in that it does not delete the WAL file immediately, instead keeping it for a specified period during which the WAL file acts as a persistent message queue that can be consumed by other applications.

A topic in TDengine is defined by an SQL statement and can be a database, a supertable, a set of tables, or a single table. Once a topic has new data points, it is pushed to consumers that have subscribed to it. If a database has multiple vnodes (shards), multiple consumers in a consumer group can consume the same topic to boost the data throughput.

The main advantage of TDengine data subscription is that you can perform data filtering. Applications can subscribe to only those data points that meet specific filtering conditions, and other data points are not passed to the application at all. In addition, the application can subscribe to a specific column or set of columns instead of all the columns. For time-series applications, this makes TDengine more flexible and efficient than third-party components.

The following figure describes how data subscription works in the TDengine time-series data solution.

TDengine time series database | 21.04.06 02 pubsub

For more information, see Data Subscription.

Time-series data solution for simpler system design

The following figure shows a classical design for a time-series data solution.

TDengine time series database | 21.04.06 03 traditional system

In this architecture, incoming data points are first written into a message queue such as Kafka. The data points in the message queue are consumed and written into a database like HBase or MongoDB for persistent storage. In the meantime, the data points are typically written into Redis for caching and Spark or Flink for stream processing. Time-series applications therefore are required to interact with Redis and Spark in addition to the database.

With its built-in caching, stream processing, and data subscription features, TDengine provides a simplified solution for time-series data processing. TDengine frees you from having to maintain Kafka, Redis, Spark, Flink, and other similar tools and allows you to develop applications without having to take into account the particularities of multiple systems.

The architecture of a simplified time-series data solution powered by TDengine is shown in the following figure.

TDengine time series database | 21.04.06 04 tdengine system

This simplified system architecture reduces design complexity and operational costs significantly, which is particularly advantageous on the edge, where resources are most limited. With its integrated components purpose-built for time-series data, TDengine not just a time-series database but a full-fledged time-series data solution.

Learn more about TDengine: