Count on It! How TDengine’s Stream Processing Enhances Real-Time Data Handling

Chait Diwadkar

November 20, 2024 / Engineering

With the release of TDengine version 3.2.3.0, we have introduced the count window in stream processing, further enhancing our ability to handle streaming data. This article will indicate the relationship between stream processing and various types of windows, focusing on the newly added count window to help you better understand TDengine’s stream processing capabilities and apply them effectively.

What is TDengine Stream Processing?

Stream processing is a data handling approach designed to process and analyze data streams in real-time, generating results with minimal latency as data arrives. With the growing demand for IoT, big data, and real-time analytics, stream processing has become increasingly important in modern data architectures. TDengine, as a database specifically designed for time-series data, offers powerful stream processing capabilities to meet real-time data processing needs.

As data volume grows, querying via SQL can result in significantly longer query times. When query times exceed 5 seconds, user interaction and experience are often negatively impacted. In such cases, pre-generating intermediate results to accelerate queries becomes crucial. Although TDengine provides multiple pre-computation methods, these methods can be limited in flexibility, particularly when defining calculation windows. Thus, for specific query requirements, stream processing emerges as a superior choice.

Stream processing is particularly well-suited for scenarios such as dashboard displays, ad-hoc queries, and real-time alerts. These applications demand quick responses, where delays can impact decision-making or user experience. With stream processing, users can store time-consuming computation results in additional result tables and update the latest window’s results in real-time as data is written. This allows users to query smaller result tables and quickly retrieve the necessary data, significantly improving query efficiency and response times.

While data subscription can achieve similar results, it requires users to write their own subscription programs, adding development complexity and necessitating extra mechanisms to ensure high availability. For example, in the event of unexpected program restarts or migrations to another node, the system must be able to reuse previous computation states to prevent data loss or interruptions. Compared to data subscription, TDengine’s stream processing offers a simpler, more efficient solution, better meeting real-time data processing needs.

Key Features of TDengine Stream Processing

Event-Based Subscription: Stream processing is event-driven, enabling it to handle out-of-order data efficiently while ensuring timely and accurate data processing. This minimizes latency by processing data immediately upon arrival.

Multiple Window Types: TDengine supports a variety of calculation windows, including time windows, state windows, session windows, and count windows, along with sliding windows. This flexibility allows users to define suitable calculation ranges based on business needs.

Device-Dimensional Computation: Stream processing can compute data on a per-device basis, enabling quick computations specific to individual devices. This reduces CPU usage and improves efficiency.

TDengine’s stream processing stands out from other frameworks in several ways, particularly in its ability to process historical data. Most stream processing frameworks only handle data generated after stream processing begins, limiting their use for analyzing long-term data. TDengine allows users to perform computations on historical data, making it invaluable for scenarios requiring retrospective analysis or a comprehensive view of historical trends.

Additionally, TDengine excels in handling expired data. Once a window closes, other frameworks may struggle to process expired data effectively. However, TDengine leverages its time-series storage engine to retrieve historical data for closed windows and recompute accurate results. This capability ensures users can obtain complete and precise data analysis even after window closure, enhancing reliability and accuracy.

For hands-on practice with TDengine stream processing, refer to the official guide: TDengine Stream Processing Documentation.

What is a “Window”?

A window is a data processing mechanism used to partition unbounded, continuous streams of data into finite, manageable segments. This segmentation allows stream processing engines to perform aggregate computations on data within each window, generating meaningful statistics such as averages, sums, maximums, or minimums for specific periods.

In stream processing, windows are essential for effectively handling and analyzing real-time data streams. Without windows, operations are typically limited to scalar functions, which apply to individual values (e.g., squaring or taking the absolute value). The introduction of windows enables more complex aggregate computations, significantly enhancing stream processing capabilities and flexibility.

TDengine supports various window types, including:

Session Window: Groups records into a session based on their timestamps. If the time difference between consecutive records is less than a predefined interval, they belong to the same session.
State Window: Uses an integer (boolean) or string to represent the device’s state at the time of record creation. Records with the same state value are grouped into the same window, and the window closes when the state changes.
Time Window: Divided into sliding time windows and tumbling time windows, where sliding windows dynamically update and tumbling windows aggregate data within fixed intervals.
Event Window: Defined by a start and end condition. The window opens when the start_trigger_condition is met and closes upon satisfying theend_trigger_condition.

Detailed Introduction to the Count Window

Introduced in version 3.2.3.0, the count window segments data based on a fixed number of rows. By default, data is first sorted by timestamp, then divided into windows containing a maximum of count_val rows. If the total number of rows isn’t divisible by count_val, the final window will have fewer rows. sliding_val represents the number of rows by which the window slides, akin to the interval in sliding time windows.

window_clause: {    SESSION(ts_col, tol_val)    | STATE_WINDOW(col)    | INTERVAL(interval_val [, interval_offset]) [SLIDING (sliding_val)] [FILL(fill_mod_and_val)]    | EVENT_WINDOW START WITH start_trigger_condition END WITH end_trigger_condition    | COUNT_WINDOW(count_val[, sliding_val])    }

Example SQL syntax for a count window:

select _wstart, _wend, count(*) from t count_window(4);

This mechanism is invaluable in scenarios like:

Highway Toll Stations: Each vehicle passing through generates a record. The count window can monitor and display traffic density in real-time, optimizing toll station operations and ensuring smooth traffic flow.
Manufacturing: For every 100 items produced, a new batch is created. The count window can aggregate parameters such as ambient temperature and production speed for each batch, enhancing production transparency and enabling timely strategy adjustments.

Example application at a toll station:

CREATE STREAM stream_name  TRIGGER at_once IGNORE EXPIRED 1 IGNORE UPDATE 0 WATERMARK 100s  INTO stream_stb_name  AS    SELECT _wstart AS ts, count(*) c1, sum(b), max(c)    FROM st    PARTITION BY tbname, ta, a    COUNT_WINDOW(9);

Conclusion

This article highlights the close relationship between stream processing and window mechanisms. The addition of the count window provides users with more flexible and efficient data handling capabilities, enhancing performance in real-time data analytics. We hope this deep dive helps you better understand and apply TDengine’s stream processing features. Give it a try and experience the difference!

Chait Diwadkar
Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.