Cache In, Speed Out: InfluxDB vs. TDengine in Real-Time Data

Jim Fan
Jim Fan
/
Share on LinkedIn

In Industrial Internet of Things (IIoT) and IoT big data scenarios, the ability to write and query real-time data efficiently is crucial. Quick access to the latest device status and real-time data processing directly impact the operational efficiency of businesses. This article focus on the differences in caching mechanisms between TDengine and InfluxDB, providing insights to help better understand the strengths and weaknesses of these two leading time-series databases(TSDB) in performance optimization.

TDengine’s Read Cache Mechanism

In IoT and IIoT applications, real-time data retrieval and processing are critical. Compared to historical data, real-time values hold higher priority and practical value. Many systems rely on real-time monitoring to drive decisions and actions. For instance, in smart manufacturing, real-time equipment status data directly influences production efficiency and product quality. Delayed decisions can result in equipment failures, production downtime, or even safety incidents. Similarly, in smart transportation and connected cars, real-time data is vital for traffic management and driving behavior adjustments, affecting traffic flow and safety. Real-time data analysis enables more precise reflections of system needs and conditions, facilitating rapid response and optimized decision-making.

To address the demand for real-time data retrieval, some architects integrate Redis as an external caching layer alongside time-series databases. However, this approach significantly increases system complexity and imposes additional burdens on limited IT resources in industrial enterprises. Many companies lack dedicated teams to deploy and maintain such complex components. Thus, an ideal data platform must provide built-in real-time analytical capabilities, eliminating the need for external tools. TDengine effectively addresses this requirement by offering robust time-series data storage and native real-time analytics, simplifying system architecture, reducing maintenance costs, and ensuring data can be promptly transformed into actionable insights.

TDengine employs a time-driven cache management strategy, prioritizing the storage of the most recent data in cache. This allows for faster query retrieval of the latest data without frequent disk access. When the cache reaches its preset threshold, older data is written to disk in batches, optimizing cache usage. This design ensures efficient real-time query performance, also reduces the write load on the disk, extending hardware lifespan.

For data retrieval, TDengine users can configure the cachemodel parameter to select different caching modes. This mechanism allow caching of the latest row of data, the most recent non-NULL value of each column, or both. This design enables TDengine to provide precise optimizations tailored to business needs. This flexibility is especially valuable in IoT scenarios, enabling quick access to the current status of devices.

InfluxDB’s Caching Mechanism

Unlike TDengine, InfluxDB lacks a dedicated read cache for data retrieval. Every query in InfluxDB fetches data directly from the storage engine, which can lead to query latency in high real-time demand scenarios. To compensate for this limitation, users often integrate external caching systems such as Redis to improve query performance. However, this increases system complexity and maintenance overhead.

Practical Cache Implementation for Real-Time Queries

To illustrate TDengine’s cache capabilities, let’s consider a smart meter application. We’ll use the taosBenchmark tool to generate the required time-series data for testing. The following command creates a database named power in TDengine, generating 1 billion time-series data points. The data starts from a timestamp of September 13, 2020, at 20:26:40 (+08:00). The Supertable meters contains 10,000 subtables (devices), each recording 10,000 data points at a 10-second interval:

taosBenchmark -d power -Q --start-timestamp=1600000000000 --tables=10000 --records=10000 --time-step=10000 -y

After data generation, we can query the latest current value and timestamp for any meter using the following SQL:

taos> select last(ts,current) from meters;

The result is as follows:

taos> select last(ts,current) from meters;
        last(ts)         |    last(current)     |
=================================================
 2020-09-15 00:13:10.000 |            1.1294620 |
Query OK, 1 row(s) in set (0.353815s)

Similarly, using the last_row function:

taos> select last_row(ts,current) from meters;

Results:

      last_row(ts)       |  last_row(current)   |
=================================================
 2020-09-15 00:13:10.000 |            1.1294620 |
Query OK, 1 row(s) in set (0.344070s)

To enhance query performance, enable read caching with the both cache mode, which caches the latest data of both rows and columns:

taos> alter database power cachemodel 'both';

Verify the database configuration, including the cache settings, by using show create database power\G :

*************************** 1.row ***************************
       Database: power
Create Database: CREATE DATABASE `power` BUFFER 256 CACHESIZE 1 CACHEMODEL 'both' COMP 2 DURATION 14400m WAL_FSYNC_PERIOD 3000 MAXROWS 4096 MINROWS 100 STT_TRIGGER 2 KEEP 5256000m,5256000m,5256000m PAGES 256 PAGESIZE 4 PRECISION 'ms' REPLICA 1 WAL_LEVEL 1 VGROUPS 10 SINGLE_STABLE 0 TABLE_PREFIX 0 TABLE_SUFFIX 0 TSDB_PAGESIZE 4 WAL_RETENTION_PERIOD 3600 WAL_RETENTION_SIZE 0 KEEP_TIME_OFFSET 0
Query OK, 1 row(s) in set (0.000282s)

After enabling the caching, querying the latest real-time data of the smart meter again. The first query will perform cache calculation, and the subsequent query latency is significantly reduced:

taos> select last(ts,current) from meters;

Results:

        last(ts)         |    last(current)     |
=================================================
 2020-09-15 00:13:10.000 |            1.1294620 |
Query OK, 1 row(s) in set (0.044021s)

Running the last_row query also returns results with similarly low latency:

taos> select last_row(ts,current) from meters;

Results:

      last_row(ts)       |  last_row(current)   |
=================================================
 2020-09-15 00:13:10.000 |            1.1294620 |
Query OK, 1 row(s) in set (0.046682s)

With caching enabled, query latency dropped from the initial 353/344 ms to just 44 ms—an 8x performance improvement. This example demonstrates TDengine’s caching mechanism’s significant impact on real-time query efficiency.

Conclusion

  • Real-Time Query Performance: TDengine’s read cache allows for rapid access to the latest data, significantly improving query performance. InfluxDB, lacking a dedicated read cache, may experience delays during real-time queries.
  • System Complexity and Maintenance Costs: TDengine’s built-in read cache reduces system complexity and operational costs. In contrast, InfluxDB often requires external caching solutions(eg. Redis), adding to system complexity and maintenance efforts.
  • Cache Strategy Flexibility: TDengine offers flexible cache configurations tailored to business needs, such as caching the latest row or the most recent non-NULL values per column. InfluxDB provides fewer options in this regard.

In Industrial IoT and big data scenarios, TDengine’s thoughtfully designed caching mechanism ensures efficient real-time data ingestion and querying while simplifying system architecture and reducing operational costs. For applications requiring high real-time data query performance, TDengine’s caching strategy offers clear advantages over InfluxDB.

  • Jim Fan
    Jim Fan

    Jim Fan is the VP of Product at TDengine. With a Master's Degree in Engineering from the University of Michigan and over 12 years of experience in manufacturing and Industrial IoT spaces, he brings expertise in digital transformation, smart manufacturing, autonomous driving, and renewable energy to drive TDengine's solution strategy. Prior to joining TDengine, he worked as the Director of Product Marketing for PTC's IoT Division and Hexagon's Smart Manufacturing Division. He is currently based in California, USA.