China Mobile IoT Reduces Storage Resources by over 80% for Connected Car Data

Chait Diwadkar

September 18, 2021 / Case Studies, Connected Cars, Industrial Data

Usage Scenario

The connected car is a typical scenario in China Mobile IoT. In such a scenario, storing and querying large amounts of the location data generated by automotive IoT devices is a very necessity.

The location data has the following significant characteristics:

High writing frequency: over 200 million location data records need to be written per day.
Low reading frequency: tracing the location data in recent days is the most commonly used query.
Enterprise customers usually require to customize the retention policy (RP) based on their usage scenarios.
Multi-Site high availability has to be considered.
The location data demonstrates low-value density, hence Online Analytical Processing (OLAP) is not a key requirement.

The Diagram of Data Storage Solutions

Our storage systems went through a series of changes as shown below.

Initially, we used Oracle running on the minicomputer to store data in 366 single-table partitions (each partition only included one table, a single table stored the total time-series data for a day), which could only stored data for one year (delete data by partitions). But Oracle posed certain difficulties in operations & maintenance (O&M). When using Oracle, we partitioned tables by date and numbered the partitions from 1 to 366 for queries. If we failed to delete the historical data in time, for instance, 2020.7.21 and 2021.7.21 would be assigned into the same partition.

In 2017, to resolve problems in the Oracle solution, we decided to employ Mycat to build a MySQL cluster.

By 2019, the production department proposed that RP for different businesses should be tailored for usage scenarios, hence we decided to create independent databases for different business systems. However, problems like the complexity of configuration and the difficulty of management started to emerge.

When the time came to 2020, we initiated a testing project to verify the writing performance and stability of TiDB. Unfortunately, TiDB is not a database specialized for time-series data, so its deficiency in the connected car scenario is quite significant.

In October 2020, we made the decision to find a professional solution for location data storage.

Pain Points

Mycat+MySQL as a location data storage solution has been served for over 2 years, but there are still many pain points:

First, Mycat is a type of middleware based on MySQL, so it does not support auto-scaling.
Performing data deletion is complicated and it is extremely inconvenient to design different retention policies if there is a necessity to tailor data lifecycle for customers’ specified requirements.
Configuring Mycat involves many painful and complex operations.
Mycat+MySQL has become an outdated solution in the industry.

In the last year, we conducted a testing project to examine the performance of TiDB in the connected car scenario and acquired some insights as shown below.

Advantages:

Efficient writing performance and horizontal scalability
High availability and fault-tolerance design
Supporting multi-site and multiple data centers

Bottlenecks:

Solid-state drivers (SSD) are needed, which renders high storage costs, so it is not cost-effective to store massive data with a low-value density.
TiDB cannot provide tailored RP for customers’

Based on the connected car scenario, location data demonstrates the following attributes:

Large writing throughput (it can reach 8,000 records per second)
Data only used for queries while no need for alteration
Massive data volumes (over 200 million data records per day)
Query focusing on recently generated data
No need for database transaction support
Only simple queries required (no JOIN query needed)
Customized RP based on customers’ requirements

Referring to the attributes mentioned above, we can conclude that traditional relational database management systems (RDMS) can hardly handle location data. Nevertheless, we found time-series databases (TSDB) could be a better answer.

Research & Selection

Driven by the purpose to find a better solution, we have conducted research on prevailing time-series databases in the IoT and connected car industry. The remarks have been stated as follows.

InfluxDB: It is the leading time-series database in the industry, while its community edition does not support clustering and high availability design in actual use.
Prometheus: Although Prometheus is one of the prevailing time-series databases, its “Pulling” design is not applicable in our usage scenario. Moreover, we keep a suspect attitude toward the high availability of its Federation
TDengine: An open-source time-series database of industry-leading writing and storage efficiency that has a professional and active developer community as well as numerous reliable use cases.

With a discreet decision-making process, our team eventually decided to focus on TDengine. Besides, we obtained efficient technical support from the TDengine development team when we expressed our interest in TDengine on GitHub.

Testing on TDengine

The deployment of TDengine is quite simple. The writing speed of TDengine is as fast as the continuously writing data to SSD, meanwhile, its compression algorithms present high-performance storage efficiency, helping save a lot of storage spaces. Also, TDengine uses user-friendly SQL-like syntax.

As for the requirement of customized RP design, it can be implemented through the configuration of the parameter “keep” and the parameter “days”.

In the testing, we tested the storage efficiency of TDengine based on a data set with 20 million data records. The MySQL table schema has been provided below:

As shown in the following chart, the testing results suggest that TDengine significantly outperforms MySQL in terms of storage efficiency.

Therefore, given the testing outcomes, we determined to employ TDengien as an alternative to Mycat+MySQL.

Data Modeling

TDengine supports automatic table creation (“one table for one data collection point”). We took full advantage of such a function during data modeling: the table will be automatically created when writing location data generated by a single device (data collection point) to TDengine for the first time. In this way, what we only need to do is “CREATE DATABASE” and “CREATE STABLE” for location data.

The STable we created for location data:

create stable device_statushis (    pos_time TIMESTAMP,    sample_time TIMESTAMP,    record_time TIMESTAMP,    online_status SMALLINT,    alarm_status SMALLINT,    pos_method SMALLINT,    pos_precision SMALLINT,    pos_longitude DOUBLE,    pos_latitude DOUBLE,    pos_altitude DOUBLE,    pos_speed FLOAT,    pos_direction FLOAT,    acc_forward FLOAT,    acc_side FLOAT,    acc_verticle FLOAT,    rollover_level SMALLINT,    power_voltage FLOAT,    acc_status SMALLINT,    satellite_num SMALLINT    )    tags(        device_id BINARY(32)    ) ;

After data modeling, we have to migrate the historical data from Mycat+MySQL to TDengine.

The overall architecture of IOV is presented below:

Conclusion & Future Plans

After data migration, TDengine demonstrates excellent performance:

The writing speed can reach 12,000 to 13,000 records per second.
Storage spaces required for storing location data are only 1/7 of the former solution.
Querying the data from a single device by one-day interval will respond within 0.1 seconds.

Currently, TDengine successfully resolved all of our pain points in the connected car scenario. Thus, in the coming future, we will apply TDengine to handle time-series data in more scenarios and keep growing with the TDengine community.

Chait Diwadkar

Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.