As popular time-series databases (TSDB) on GitHub, both TDengine and Prometheus have developed their own time-series data storage engines to achieve very good performance. In some aspects, both storage engines take the same approach, but there are many differences too. This blog compares the two, to help developers choose the right time-series database for their applications.
TDengine adopts a relational data model, while Prometheus adopts a tag-set model. TDengine requires a table for each time series, and the table name is the unique ID for this time series. TDengine introduces a new concept called the supertable to handle labels. The metric name is usually used as the supertable name, and each table can be assigned with a set of labels. In the case of Prometheus, every time series is uniquely identified by a metric name and a set of labels.
For both TDengine and Prometheus, the metric name and labels are used to filter out the set of time-series for aggregation. Prometheus limits labels to be strings but TDengine supports any data type, including numeric. By allowing numeric data type, TDengine can filter on a range of values, which is a very useful feature in some cases.
Changing a label, or adding and deleting a label in Prometheus creates a new time-series, while this is not the case for TDengine. For IoT and many applications, the time-series data is always generated by sensors or devices. The static attributes like brand, model, color etc. can be used as labels. The applications may add a new label or change the labels after the system is in production. TDengine can handle the label changes without any problem, but Prometheus can only handle this by importing the data from old time-series into the new time-series.
For more details on the data models in different TSDB, see Data Model Comparison between Time-Series Databases.
Both TDengine and Prometheus store each time-series, in chunks in a data file. The data in a chunk is owned by only one time-series. The data file is appended only and consists of many chunks. In addition, the data file contains data chunks for multiple time series. An index file stores the list of chunks for each time series. The list contains the offsets of data chunks in the data file, start/minimum time stamp and end/maximum timestamp, chunk size and other information. As long as a time-series is identified, the index file is used to find the offset of each data chunk which falls in the query time range and the saved time-series data can be retrieved from data file. Both TDengine and Prometheus have the same design in this case.
For both TDengine and Prometheus, this design is important and works very well for time-series data. It brings many benefits as below:
- Data ingestion rate is higher, since for each time-series, data is appended to a chunk. This is the most efficient way to write data.
- Query latency is smaller, since one read operation will read a large number of data points for a time series. In other words, the data hit rate is much higher.
- Data compression ratio is higher, since for each time-series, the delta between two consecutive data points is much smaller.
Both TDengine and Prometheus store labels in a file different from the time series data file. Prometheus stores them in an index file, and TDengine stores them in a meta file. This is an optimal design for time-series databases. Unlike Key-Value databases, separating labels (keys) from time-series data, can improve query performance significantly. Given the labels in the query, it filters out the set of time-series first. This step takes minimal time since the data set of labels is much smaller compared with the dataset of time-series data. Following the filter operation, it scans a subset of data chunks instead of all the rows in a data file and so the query latency is significantly reduced.
WAL and Memory Buffer
Both TDengine and Prometheus use a WAL(write-ahead-log) mechanism to guarantee the written data won’t be lost if the system crashes. The data points are always written into memory buffer first and in the meantime, data points are appended into WAL file.
But TDengine and Prometheus take different approaches for the memory buffer. Prometheus allocates chunks in memory for each time-series. The data points from a time series are written into its corresponding chunk by appending. The number of memory chunks is always larger than the number of time series.
TDengine allocates a big memory buffer for all time-series in a virtual node. The data points from all time series are written into the same big memory buffer by appending. Practically, for TDengine, the big memory buffer is just a copy of WAL in memory. TDengine maintains a skip list for each time-series to keep track of the order and offset for data points.
The Prometheus storage design seems more efficient because data can be written into memory without a skip list operation. But the gain is very small, because the bottleneck is never memory operations, but disk I/O. In addition, this design has a few drawbacks.
- If the data ingestion rate of all time-series varies a lot, when flushing data in buffer into disk, some chunks may have few data points, while some time series have full chunks. The memory usage is not efficient.
- If the number of time series is very big, it requires a huge amount of memory since each time series requires allocation of a memory chunk. The system may encounter high cardinality problems.
- It is very hard to process out of order data points. To process out of order data points, you have to re-write the data chunks. To avoid this issue, Prometheus just throws away the out of order data points. For lots of scenarios, like IoT and connected vehicles, the order of data points for a time series cannot be guaranteed. Even the device/sensor generates data points in order, but message queue like Kafka in the system may put data points out of order because of its partition.
TDengine’s earlier version took the same approach as Prometheus. After encountering the above problems in several applications, the TDengine team switched to a new design from version 2.0.
With the current TDengine design, it is efficient to process out of order data points since it maintains a skip list for each time series. The memory usage is efficient because all time series share the same memory buffer.
Both TDengine and Prometheus partition the time series data into non-overlapping blocks. Each block covers a time range. This strategy is used by all the time-series databases. For Prometheus, the default time range for a block is 2 hours. For TDengine, the default time range is 5 days.
However there are differences between TDengine and Prometheus partitioning. For Prometheus storage, a block is a fully independent database and besides the time-series data, a block also has all the metadata. The metadata includes the labels for each time series. But TDengine only maintains one copy of metadata, i.e. all the blocks share the same metadata.
The above difference has several impacts as is seen below:
- In some DevOps scenarios, a time-series lifespan can be very short – for example, the monitoring for a running task will last for only a few minutes to a few hours. In this scenario, many new time-series are created all the time and the number of time-series will be very high. This is a challenge for TDengine since it will have to process a big volume of metadata which may then slow down the query. But for Prometheus, this won’t be a problem, since each block only contains metadata for the time series which have time-series data and so it is a smaller volume of data to process. But TDengine can mitigate this problem by setting a shorter data retention period so that if there is no data for a time-series, its metadata will be removed.
- If a query spans two or more blocks, it will take Prometheus a longer time to retrieve the data since it needs to read metadata for each block. This increases query latency. That’s the reason why lots of users complain about the performance of queries on historical data.
- Prometheus’s design takes more storage space since each block has its own metadata.
TDengine introduces an attractive feature, based on partitions: multi-tier storage. Hot data blocks can be saved on SSD, warm data on local disk, and cold data can be saved for e.g. on S3.
Data Sharding/Horizontal Scalability
Prometheus does not support data sharding and so its processing power is limited by the resources of a single node.
Instead, TDengine supports sharding and its cluster feature is open source. A shard in TDengine is called a vnode (virtual node). Each vnode contains data for a number of time-series. The data for a time-series will be stored in only a single vnode and it won’t span across two or more vnodes. A vnode is essentially a fully independent time-series database.
Furthermore, two or more vnodes on different nodes can form a vnode group for data replication and thus high availability is provided. More details can be found on the TDengine website.
There are other differences between TDengine and Prometheus as follows:
- TDengine supports multi-column for a time series, but Prometheus only supports single column. For IoT and many scenarios each device/equipment may generate multiple metrics at the same time. For example, GPS position (x, y, z) of a moving vehicle, current and voltage of a smart power meter. In these scenarios multi-column is definitely more efficient.
- Prometheus only supports numeric data type while TDengine supports many data types, including numeric, string, boolean and vchar.
Through the above comparisons, we can say the following:
- TDengine is a more generic time-series database and can be applied to a variety of scenarios like IoT, connected vehicles, smart hardware and more.
- Prometheus is more optimized for DevOps scenario and especially for the management of short-lived time series.
- From a performance perspective, data ingestion rates are similar and latency to query hot data is similar but TDengine has higher performance for cold data.
- TDengine has a higher data compression ratio.