Replacing TiDB with a high-performance and scalable IoT platform

In the driving learning and testing business, it is necessary to store and analyze data generated by sensors in learning vehicles and correlate them to driving instructors. This is a typical time series data scenario. The developers at 58.com were dissatisfied with the performance of TiDB. The DBA team started looking into time series databases and chose TDengine from the 6 time series databases that we investigated. After in-depth research, development and testing, we deployed TDengine and saw a resolution to all our pain points.

Project Introduction

58.com‘s portfolio of businesses cover many different segments with varied needs and requirements. As a result, the DBA team maintains a variety of databases. The databases include MySQL, Redis, MongoDB, ELK, TiDB, StarRocks, and NebulaGraph and between them they have to serve several trillion requests per day. What we were missing was a dedicated time-series database (TSDB) and this was starting to become apparent to both developers and DBAs.

One of the businesses in our portfolio is a driving learning and testing business, in which it is necessary to store and analyze the time-series data generated by sensors in learning vehicles. This use case involves high concurrent insertions and data processing and is essentially an IoT use case characterized by very high writes, fewer reads and aggregate analysis within specific time periods. 58.com was using TiDB for persistent storage but the developers were not very satisfied with the performance of TiDB. The specific requirements of the business are: adding 10 million records per day, write latency in milliseconds, and sub-second response for queries that return 30,000 pieces of data.

Of course the performance issues that we faced, could not be blamed on TiDB since TiDB was not the best database for this kind of task. No single database platform can possibly meet the needs of all use cases. In order to meet the time-series data needs of the portfolio of businesses, our DBA team began to investigate time series databases.

Database Selection and High-Level Requirements

The DBA team looked at 6 databases and rated them based on the above high-level requirements.

  • InfluxDB: The most popular time series database. Single-machine version is open source but the high-availability cluster is not open source.
  • OpenTSDB: Mature cluster solution, dependent on HBase, complex operation and maintenance, weak aggregation analysis ability.
  • Prometheus: Simple maintenance, integrated monitoring and alarm functions, but no cluster solution, weak aggregation analysis capabilities.
  • DolphinDB: Fast to run, fast to develop, fast to deploy. Closed source, free version only works on limited CPU and memory.
  • ES: Clustered, easy to use, low maintenance cost, but high memory consumption, performance drops significantly when historical data is calculated
  • TDengine: Powerful performance, built from scratch for IoT and Big Data time series, open-source free cluster version, and a paid Enterprise version with additional functions, and it is still being developed iteratively.

Based on the above, after preliminary analysis, we finally chose to conduct in-depth research and testing on TDengine.

Test Environment

Cluster Architecture: 3 replicas + load balancing

Fault tolerance: WAL (write ahead log) first, and then write to cache and the data file depending on configuration

Test tool: Official tool taosdemo (now known as taosBenchmark)

Machine Configuration

ServerProcessorMemoryDisk
1Intel Xeon Silver 4214 2.2 GHz (2 cores 24 threads)256 GSSD
2Intel Xeon Silver 4214 2.2 GHz (2 cores 24 threads)256 GSSD
3Intel Xeon Silver 4214 2.2 GHz (2 cores 24 threads)256 GSSD

Write Concurrency

Number of FieldsBind ParametersNumber of DevicesTotal DataTotal Records WrittenLatency
3Yes1000010000000012120036.84 records/second2.49ms
3No100001000000005932514.09 records/second6.86ms
20Yes100001000000005883183.51 records/second5.09ms
20no100001000000002078181.61 records/second19.20ms
20Yes10000010000000002673966.95 records/second174.34ms
20no10000010000000002621617.26 records/second57.08ms
50no10000010000000001216557.03 records/second57.19ms

Maximum Server Load

CPU LOADMemoryDisk IOnetwork
ALL<10<7G<10%<280M

It can be seen from the above test results that TDengine can meet our business requirements in terms of response time, insertion performance, server load, and low cost operation and maintenance. 

Deployment in Production

In the actual production environment deployment , we did not use the officially recommended single-machine single-instance deployment, but single-machine multi-instance deployment , which is a three-node + three-copy architecture. We used TDinsight + Grafana for monitoring.

In practice, based on our requirements we reduced the step size and the number of sub-tables, with the aim of achieving higher concurrency. In response to the pain points in the previous application of TiDB, the developers reported that TDengine was able to return the required data within 1 second with no code adjustment, and was able to return the required data within 300 milliseconds after the optimizing the code. Additionally we were able to use built-in TDengine time-series aggregation functions to avoid implementing additional business code which saved us time and effort.

We did encounter some issues, which are summarized here as improvement suggestions for the TDengine team:

  • The single-machine multi-instance deployment method we use does not have good command support in the currently released version. The location of the log output also needs to be explicitly specified. As business needs change, deployment methods will change and we hope TDengine will provide more flexible deployment methods to allow better predictability of resource costs.
  • TDinsight can monitor the performance of TDengine very intuitively, but only one set of clusters can be monitored in one view. If there are dozens or hundreds of clusters, the integrated TDinsight components cannot be added. You can manually modify the Grafana view variables one by one which is extremely tedious. We hope there will be better integration to allow easier monitoring of multiple clusters.
  • When importing using taosdump, if the library name is changed, the original backup file with the original library name cannot be used for import. We hope that this will be changed in future versions.

Summary

TDengine has become one of the options in the database services matrix for the DBA team at 58 Group so that time series data is ingested, analyzed and stored in an optimized and cost effective manner. In the future, monitoring services will also be transferred to TDengine. 58 Group is exploring more avenues for cooperation with TDengine as an integral part of the database services offerings.

About the Author

Zhang Guangyuan, is a DBA in 58 Group and is a ten-year veteran of database operations, maintenance and management.