To future-proof their real-time vehicle tracking platform for massive growth, Jikesoft replaced their complex and slower architecture of InfluxDB, Redis and MySQL with TDengine. Besides meeting their high performance and low latency requirements, TDengine reduced their operational and maintenance costs significantly
1. Business Background
The vehicle location and monitoring project aims to achieve unified vehicle supervision, trajectory tracking, big data analysis, and visualization applications through vehicle trajectory supervision, anomaly detection and warning, and historical trajectory playback. The project reports the location data of tens of thousands of vehicles to the GIS gateway service through an on-board location device. The GIS Gateway sends this to the location processing service via the message queue. This service then writes the location data to InfluxDB and real-time information is written to Redis. Other query and analysis services such as vehicle real-time location tracking and historical track playback are also provided as can be seen in the figure below.
2. Time-series Database Selection
The vehicle location and monitoring project aims to achieve unified vehicle supervision, trajectory tracking, big data analysis and visualization applications through vehicle trajectory supervision, anomaly detection and warning, and historical trajectory playback. The project reports the location data of tens of thousands of vehicles to the GIS gateway service through an on-board location device. The GIS Gateway sends this to the location processing service via the message queue. This service then writes the location data to InfluxDB and real-time information is written to Redis. Other query and analysis services such as vehicle real-time location tracking and historical track playback are also provided as can be seen in the figure below.
- InfluxDB : Open source time series data developed by InfluxData. It is written in Go and focuses on querying and storing time series data with high performance. It is widely used in IT monitoring and also in the IoT industry. One of the disadvantages is that the open source version only supports one node and does not have high-availability features and there is a compatibility problem between versions.
- OpenTSDB : A distributed and scalable time series database based on HBase. As a typical representative of time series databases developed based on general storage, it started relatively early and has a relatively high degree of recognition in the field of time series databases, but the problem of high TCO of HBase cannot be avoided.
- TDengine : An open source time series database which supports standard SQL with extensions. It also supports a continuous query and stream computing based on sliding windows. The novel concept of super tables makes data aggregation between devices simple and flexible through tags. It has an embedded cache mechanism which allows the latest status or records of each device to be obtained instantaneously. It has a native distributed architecture which supports horizontal expansion to ensure that any size of data can be processed. The open source version supports high availability and so there is no single point of failure. These functions and features are exactly what is needed in a high-performance, high-availability production environment.
After the actual comparison, plus factors such as minimizing migration changes we selected TDengine as the storage solution for vehicle trajectory data.
The architecture after migrating to TDengine is illustrated in the following figure:
3. TDengine in Practice
In terms of resources, we chose three servers and built a cluster of 3 nodes and 3 copies. Currently, real-time vehicle operation monitoring has been deployed on TDengine with a batch of vehicles and we are gradually migrating the data of all vehicles to TDengine.
Historical data is being migrated from InfluxDB to TDengine using the DataX data synchronization solution. We have extended InfluxDB’s read plug-in and TDengine’s write plug-in. With our extensions, single-process data synchronization can reach a synchronization speed of 60,000 per second. The speed is limited by the reading limits of InfluxDB – it consumed memory very rapidly and we could not get past 60,000/second. It should be noted that TDengine has released a migration tool based on DataX.
Analysis of some of the migrated data: the total data volume is 650 million, distributed across 14,742 sub-tables and occupying 4.7G of disk space. The compression rate reaches 4%. When the cachelast option is enabled, the latest row of data in the sub-table is cached. When querying the latest location of a specified vehicle for real-time monitoring, the data is read directly from cache!
In the prior architecture, the latest position data of a vehicle was stored in Redis and MySQL at the same time. With the caching mechanism on TDengine we have simplified our architecture and reduced our maintenance and operational costs by eliminating Redis and MySQL.
4. Performance of TDengine
In this project, we mainly focus on real-time location monitoring of the vehicle and trajectory playback within a time range for a particular vehicle or set of vehicles.
4.1. Vehicle real-time location monitoring queries
Query the latest GIS location data for single or multiple vehicles. The latest location query of a single vehicle:
select last_row(*) from d_track where car_id in ('dc8a9a0d7b634c9ba4446445c6c');
The query execution time for the latest location of a single vehicle using the query against the supertable, is 11 milliseconds. One can also directly query the subtable as follows:
select last_row(*) from _018d16c826cb405ea4a94a14cd4f95a9;
The response time is 1 millisecond.
Querying the latest location for multiple vehicles in the supertable as follows has a response time of about 12 ms.
If we query the supertable for the latest location of 500~1000 vehicles, the response time is < 1s which fully meets the our latency requirements.
4.2. Time based tracking of vehicles
If we query the trajectory data of the specified vehicle within a specified time range, and directly query the specific sub-table for query, the response time is about 0.07s.
select * from _0128a4d193424dcfb217242f054716d4 where time >'2021-09-08 10:34:44.000' and time <'2021-09-23 21:38:18.000';
The query response time of the test data is about 0.07 seconds.
In the above two query scenarios, not only does the performance of TDengine fully meet our requirements but performance is significantly improved compared to the original InfluxDB+Redis+MySQL solution.
Some screenshots from the application are shown below:
I am very grateful to TDengine, which has met our stringent needs in performance, cost reduction, and smooth migration. We are also very grateful to Mr. Luo Ge from TDengine for his expert and careful guidance as we went through the process of trying, developing and deploying TDengine. His guidance accelerated our mastery and deployment of TDengine.
At present, TDengine supports real-time location monitoring of tens of thousands of vehicles and we will continue to expand the number of vehicles.
Jikesoft Information Technology Co., Ltd. is a high-tech enterprise whose mission is of “a world with safe food” and the vision of “being a leading enterprise in using digital technology for food safety”. Using leading edge technologies to ensure full traceability from the farm to the table, we take advantage of satellite based remote sensing, 5G communication, IoT, machine learning and AI and blockchain, to provide solutions for smart agriculture, smart food safety, and smart cities.
About the Author
Sun Yunsheng is an IT Architect and is part of a new generation of knowledge workers at Jikesoft Technology Research Institute who are deeply engaged in software, communication and information technology services.