With the growth of business, the requirements for data storage performance in the IoT applications of Li Auto are constantly increasing. Our internal teams actively try to maintain industry best practices to match and optimize products and requirements. This article will explore why we chose and implemented TDengine for our IoT use cases.
Introduction to Business Requirements
Essentially, our application scenario involves signal writing and querying. We need to write time stamped data from collection points into the back-end database through the cloud, and then perform aggregation queries on this data. This is a typical high-concurrency insert scenario, with more writes and fewer reads. Currently there are 70,000 writes per second, and we expect to reach more than 200,000 writes per second in the next 3 years.
Our previous system used MongoDB but due to the limitations of MongoDB, we migrated the business to TiDB to facilitate expansion. After migrating to TiDB on Baidu Cloud SSD virtual machines, the pure write performance of the TiDB cluster still could not meet our high performance business requirements. TiDB is a HTAP (Hybrid Transactional and Analytical Processing) database and does not perform well with high concurrency writes and could not scale without resource expansion.
Overall, TiDB is suitable for TP or light AP scenarios, but it has high resource requirements for hardware. For time series data, the cost-effectiveness of writing to TiDB is very low. In addition, it imposes unnecessary complexity. The underlying database table must be built for each month, and each collection point must be labeled. It is also not optimal or even suitable to perform large batch writes. In general, the current architecture is unable to meet our requirements that are summarized below.
- The system must support continuous high concurrency writes, with tags, and out of order timestamps.
- The business is expected to grow by orders of magnitude and the system should be able to scale automatically.
- The system must support aggregated queries over large time-spans.
- Because the amount of data is very large, it is necessary to support data compression to reduce costs and increase efficiency.
- There should be no need to separate data based on month and the system should support configurable, automatic time-based data expiration.
- Collection points should have separate tables.
- The system should support batch data writing with low latency.
Based on these requirements, we decided to try the time series database (TSDB) TDengine. The performance and functionality of TDengine exceeded our expectations when we performed in-depth technical and business tests. Special thanks to Xiao Bo, Chen Weican and Yang Lina for their strong support. The following features of TDengine are well suited for our requirements.
- Two-level storage structure, row-based in RAM and column-based on disk, with very high data ingestion performance and efficient resource utilization.
- Very high compression ratio for time series data.
- Create separate tables for collection points which matches our business requirements.
- Supports large multi-thread batch data writing.
- Transparent scale-out and scale-in.
- Automatic data expiration aka TTL is supported.
TDengine’s excellent high concurrency writing and data compression capabilities greatly reduce business costs and it’s native clustering features ensure scalability. With all of this in mind, we decided to migrate from TiDB to TDengine.
Migration Process and Usage Costs
- Start writing new data to TDengine and keep historical read traffic in TiDB solution
- Historical data is gradually imported into TDengine
- Deployment scheme: The method of domain name—>LB—>firstEP+SecondEP is adopted. For details, please refer to the blog “Best Practices of TDengine Containerized Deployment” .
Cost of Use
|Total 17 nodes
5 TiDB servers
3 PD servers
12 TiKV servers (deployed on 12 nodes)
32-core/64 GB SSD
|32-core, 128 GB SSD
|Except for a larger SSD, other configurations are the same as TiKV
|TDengine is row-based storage in RAM and column-based storage on disk.
|TDengine has much lower latency.
|Three replicas at 1.3 TB/month
|Two replicas at 87 GB/month
|Over 10x lower storage costs for TDengine
|Additional nodes are needed for faster insertion speed
|Stress testing on multiple Kubernetes nodes without degradation in performance.
|TDengine Cluster supports 200,000 writes/second.
- Excellent time series database engine with better performance than InfluxDB.
- Two-level storage architecture design (row storage and column storage) is optimal.
- Designed from scratch for IoT Big Data using novel concepts like supertable.
- Reduces storage and compute costs.
- Elastic expansion and contraction using the firstEP mechanism.
- Aggregate queries are lightning fast.
- Automatic time-based data expiration (TTL) and label mechanisms are very convenient to automate business requirements.
- Currently there are only basic monitoring indicators and these need to be improved. Performance checking also needs to look at logs, and write delays need to be monitored.
- Support for ecosystem tools needs to be improved to make administration and management easier.
- Upgrades to servers and clients should be consistent and automated/packaged. Currently clients have to be updated in rolling batches, which is not very user friendly.
- Various types of error messages need to be further improved to be more user-friendly and easier to troubleshoot.
- The Go SDK does not support the prepare statement (Editor’s Note – the new version already supports it)
- Account isolation support needs to be improved. Currently, one cluster cannot support multiple accounts.
In the end, whether it is MySQL, MongoDB, TiDB or TDengine, they are all excellent database products. But no database product is a silver bullet. The product that can adapt to, and meet business requirements, is ultimately the right solution.
For Li Auto’s extremely high-performance and future scalability requirements, it is TDengine.