By 2025, Who’s Still Not Using REAL Stream Processing?

Chait Diwadkar
Chait Diwadkar
/
Share on LinkedIn

As IoT, connected vehicles, and industrial IoT continue to expand, the need for efficient time-series data processing is growing rapidly. To address this, time-series databases have become essential, offering robust support for high-frequency data ingestion and real-time analytics. TDengine and InfluxDB are two prominent solutions in this space. While both are powerful in handling time-series data, they differ significantly in their stream processing capabilities.

InfluxDB offers basic Continuous Query functionality, but this is not true “stream processing.” It works well for simple tasks but falls short when it comes to more complex real-time data stream processing. In contrast, TDengine provides full stream processing capabilities, allowing it to handle real-time data streams from various sources without relying on external frameworks like Spark or Flink. This makes the system more streamlined and reduces operational costs.

Comparison of Stream Processing Capabilities

InfluxDB Does Not Provide True Stream Processing

InfluxDB only offers basic Continuous Queries that support simple sliding window calculations. A sliding window is a computation method based on fixed time intervals, where the results are updated dynamically as time progresses. For instance, it can aggregate data minute by minute. However, InfluxDB’s stream processing capabilities are limited to basic tasks like sliding windows and simple aggregations, which may not be sufficient for more complex use cases.

To handle more advanced data processing and real-time analytics, InfluxDB users often need to rely on external stream processing frameworks such as Spark or Flink. This introduces added complexity to the architecture and increases operational overhead, as multiple systems must be managed and coordinated. Furthermore, data transmission and synchronization between different platforms can cause delays and create performance bottlenecks, particularly in high-concurrency environments with frequent data writes and queries.

TDengine’s Stream Processing

In contrast, TDengine offers a more comprehensive and flexible approach to stream processing. Beyond basic sliding windows, it supports various window types, such as state, session, count, and event windows. These options give users the ability to segment and aggregate data based on their specific needs.

Event-Driven Stream Processing

TDengine supports event-driven stream processing, enabling users to trigger computations based on business events rather than being constrained to fixed time intervals. This feature is particularly valuable for processing data linked to real-world events, as it can significantly reduce latency and enhance system responsiveness.

Various Window Types

TDengine offers significant advantages in its design of window types, making it well-suited for more complex time-series data analysis. Here are some of the window types supported by TDengine:

  • Time Window
    • Sliding Time Window: The window moves forward over time, continuously aggregating the most recent data.
    • Tumbling Time Window: Unlike sliding windows, tumbling windows do not overlap and calculate data in distinct, non-overlapping intervals, ideal for situations where data overlap is not needed.
  • State Window A state window aggregates data based on changes in its state, useful for tracking transitions between different states. For example, in monitoring systems, state windows can calculate the duration for a device to move from a normal state to a fault state.
  • Session Window A session window groups data based on “sessions,” making them suitable for analyzing data that clusters around specific events or behaviors. For example, in user behavior analysis, session windows can track the complete activity cycle of each user.
  • Count Window A count window segments data based on a fixed number of rows. By default, data is first sorted by timestamp, and then divided into multiple windows based on the count_val (the maximum number of rows per window) for aggregation calculations.
  • Event Window The event window is more specialized, allowing users to trigger aggregations based on specific events. By defining start and end conditions, users have fine control over the scope and content of the calculation window.

These window types give TDengine the flexibility to handle a wide range of complex data processing needs across various scenarios.

Computational Performance and Latency

High Throughput and Low Latency TDengine’s stream processing engine offers exceptionally high throughput, maintaining millisecond-level latency even with frequent data writes. This capability is crucial for real-time applications like monitoring and predictive maintenance. For instance, in smart metering systems, where data is collected every 10 seconds and users need to query average temperature every minute, TDengine’s stream processing handles these tasks efficiently and in real time.

Lightweight Alternative

Unlike traditional, complex stream processing systems, TDengine provides a lightweight solution. With its built-in window clauses and straightforward SQL syntax, users can perform real-time data processing without relying on external stream processing engines. This approach simplifies the system architecture and reduces resource consumption.

Syntax and Usage of Window Clauses

TDengine’s window clauses offer great flexibility, allowing for a variety of window computations using SQL syntax. Here’s a basic example of how a window clause is used:

SELECT tbname, _wstart, _wend, avg(voltage)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+08:00"
AND ts < "2022-01-01T00:05:00+08:00"
PARTITION BY tbname
INTERVAL(1m, 5s) SLIDING(2s)
SLIMIT 1;

In this query, the data is first partitioned by table name, then aggregated using a 1-minute time window, which slides every 2 seconds. TDengine supports a variety of window types and aggregation functions, enabling users to customize their queries to meet specific requirements.

Conclusion

Compared to InfluxDB, TDengine not only offers more advanced stream processing capabilities but also possesses strong ETL (Extract, Transform, Load) functionality. It can process time-series data while automatically performing data cleaning and transformation, helping users achieve more efficient and flexible data processing. This advantage allows TDengine to deliver exceptional performance in more complex use cases, particularly in industries that require real-time data analysis and high-efficiency data processing.

TDengine’s built-in stream processing capabilities enable users to perform real-time data analysis more effectively, reducing the overhead of multi-platform integration, operational monitoring, and other additional costs. This results in better performance and lower operational complexity, especially in large-scale IoT, vehicle networking, and other real-time data processing scenarios.

If you want to experience TDengine’s stream processing capabilities, visit the official documentation to learn more about its configuration and usage, and fully leverage TDengine’s powerful real-time data processing features.

  • Chait Diwadkar

    Chait Diwadkar is Director of Solution Engineering at TDengine. Prior to joining TDengine he was in the biotechnology industry in technical marketing, professional services, and product management roles and supported customers in pharma, medical devices and diagnostics on analytical chemistry, and genetic platforms.