Migrating a disaster prediction and monitoring platform from Oracle to time-series database

TDengine Team

Original Author: Sky Digital Technology., Ltd.

Introduction: An Oracle-based solution could not meet the increasing requirements of Sky Digital’s geological disaster monitoring and prediction platform. They chose to migrate to TDengine for this mission critical application. This article elaborates on their business pain points, solutions and the positive outcomes.

1. Company Profile

Sky Digital Technology Co., Ltd. is a high-tech enterprise specializing in 3S technology research, development and application services, dedicated to government, national defense and enterprise information solutions. It is also the first in the industry to create GIS solutions for the entire industrial chain. Sky Digital Technology Co. is a platform-level product, technology and service provider with core competitive advantages in the field of information technology.

Geological disasters are classified into 12 types and a total of 48 sub-types in China. Among all geological disasters, collapses, landslides, and debris flows are the most serious. They are widely distributed, catastrophic, destructive, hard to predict, and prone to cascading effects. They cause direct harm to residents’ lives and property, and also indirectly affect social stability, cause ecological deterioration, and aggravate other various natural disasters.

TDengine time series database | 22.024 01 landslide

In China’s “13th Five-Year Plan” for the construction of a national emergency response system, prevention and preparation have long been emphasized as the most effective means to deal with geological disasters. Thus, it is imperative to build a professional monitoring IoT platform for geological disasters.

2. Use-cases & Pain-points Analysis and Solution Selection

An IoT monitoring solution should generally have the following five functions:

  • Data Collection:
    • Devices may transmit data at different frequencies and business requirements might change and the frequency of collection and transmission may change. An IoT monitoring solution has to automatically adapt to these changes without requiring manual configuration or changes.
  • The solution should be able to monitor and notify about anomalous data.
  • Data Transmission:
    • IoT solutions should automatically adapt to network, power or other outages. These could be due to weather. For e.g. hilly areas are often cloudy and rainy and prone to outages. A solution should be able to deal with missing data points and should also have high-availability built-in.
  • Data Storage:
    • A robust IoT solution should be easily configurable to store historical and real-time data in respectively optimized storage that balances efficiency of access with cost of storage. At the same time, it should make querying historical and real-time data transparent to the user.
  • Data Analysis:
    • Query latency needs to be very low and a solution should be flexible to accommodate various devices and have device-based policies. For AI related to monitoring and predicting geological disasters, a solution should be able to adapt to various AI models.
  • Early Warning and Forecast: It is better for a solution in this case to err on the side of false positives. False negatives would be unacceptable.

With the above main requirements in mind, the first thing to be solved is the storage and analysis of massive amounts of time series data with extremely high performance requirements for writing and querying data. Traditional relational databases have been unable to meet real-time writing and high-performance querying requirements. Our professional monitoring platform for the Internet of Things for Earth Disasters was developed in 2018. At that time we had selected Oracle as a large-scale Enterprise RDBMS. As more and more devices & sensors were connected and the amount of data was increasing by leaps and bounds, the system was becoming slower. This was highly apparent during the rainy season when the frequency of sensor data collection was increased to the second or millisecond level, data storage was often blocked, and the efficiency of the system was extremely low. From 2019, we began to investigate various time series databases (TSDB). We found that TimescaleDB and TDengine were more suitable for our requirements. We then looked into indicators such as data writing, querying, and difficulty in getting started for TimescaleDB and TDengine. The following is a comparison and analysis of the two databases from the commonly used application scenarios using the hourly rainfall data of rainfall stations in the province in the past 10 years as the test data.

  • The test machine configuration information is as follows:
TDengine time series database | 22.024 02 configuration
  • Comparison results of historical data batch storage scenarios:
TDengine time series database | 22.024 03 comparison 1
  • Data file size comparison results after storage:
TDengine time series database | 22.024 04 comparison 2
  • Comparison of common query scenarios:
  1. Query the cumulative rainfall (10-30 days) of all stations in the province, group by station, and summarize the cumulative rainfall
  2. Query the annual data of a single site, all sites in districts and counties, and summarize statistics by year, month, day, and hour (2138 rainfall sites in the province)
TDengine time series database | 22.024 05 comparison 3

Conclusion: From the perspective of storage, compression ratio and query latency, TDengine is a handsdown winner. A more comprehensive comparison is shown below.

Time-series storage engine
Open-source storage engine
Open-source cluster solution
Traditional relational database query support
Query performance
Write performance
Data compression ratio
Child tables
Aggregate querying functions (SUM, COUNT, MIN, MAX, AVG)
Sliding window
Learning curve

3. Database Design and Application

The architecture of the data acquisition part is shown below.

TDengine time series database | 22.024 06 architecture

We implement storage using the hierarchical capabilities of TDengine, both in terms of space and time. For the same type of sensor, we use a supertable to represent it. For each sensor, we create a subtable for it.

Build the Database and Schema

  1. Create database

The parameters shown below are:

  • Data is retained for 20 years
  • Time span in days for a datafile to keep data – 90 days
  • Vnode has 30 memory blocks (increased depending on machine memory).
  • Updates are allowed i.e. Time-stamped data can be updated.

The database building statement is as follows:


2. Create a supertable for each sensor type. Taking the water pressure gauge as an example, assuming that the sensor type code is 201, the supertable is named m_201, and other sensor supertables are named m_xxx (sensor type code). We designated the acquisition time as t20, and synchronous storage time as t30. The supertable contains 2 tags, and the sensors are coded and numbered (note that the tag value of each subtable is the same). The statement to create a supertable is as follows:

CREATE STABLE IF NOT EXISTS m_201 (t20 TIMESTAMP, t30 TIMESTAMP ,v_water FLOAT) TAGS (sensor_code BINARY(20), sensor_id INT);

3. Data storage The supertable is used as a template and the subtable naming rule is m_XXX (sensor code). The insert statement is as follows. Note that TDengine will automatically create the subtable if it doesn’t exist.

INSERT INTO m_05162300008 USING m_201 TAGS ('05162300008', 2) VALUES ('2021-01-01 00:00:00.000','2021-01-01 00:00:00.001', 0.02);

4. Queries When performing a time window (interval) avg aggregation rolling query, the time window unit can be b (nanosecond), u (microsecond), a (millisecond), s (second), m (minute), h (hour), d (day), w (week), n (calendar month) and y (calendar year). The statement for querying the average water content of each day is as follows.

SELECT AVG(v_water) FROM m_05162300008 WHERE t20 >= '2020-01-01 00:00:00.000' AND t20<'2020-01-03 00:00:00.000' interval(1d) ;

Development Experience

For our Java based application, we chose the JDBC-RESTful for our development environement, and the JDBC-JNI interface for client installation in our production environment. Compared to the RESTful interface, JNI has performance advantages and maximizes the performance of TDengine. However the RESTful interface has the advantages of being light and easy to debug. The switch from JDBC-RESTful to JDBC-JNI needs only a simple modification of the configuration file as shown below.

TDengine time series database | 22.024 07 configuration

Typical Applications

For this project, the important and highly relevant features of TDengine are:

  • Using the supertable template, data is quickly stored in the database, and subtables are automatically created.
  • Sliding window statistics function.
  • Very high performance with very low resource requirements and this leads to substantial energy savings and emissions reductions.
  • Development and deployment are very easy and convenient.

4. Conclusion

As we have seen, TDengine shows extremely high read and write performance and data compression capabilities.The aggregation query speed is very fast and we are able to reduce resource use and costs. Novel concepts in TDengine such as supertable and subtable along with improved storage for labels and metadata, and built-in statistical functions are very suitable for IoT big data application scenarios. TDengine has the potential to be the Oracle of IoT Big Data with improvements and new features and I wish it well.