Original Author: Sky Digital Technology., Ltd.
Introduction: An Oracle-based solution could not meet the increasing requirements of Sky Digital’s geological disaster monitoring and prediction platform. They chose to migrate to TDengine for this mission critical application. This article elaborates on their business pain points, solutions and the positive outcomes.
1. Company Profile
Sky Digital Technology Co., Ltd. is a high-tech enterprise specializing in 3S technology research, development and application services, dedicated to government, national defense and enterprise information solutions. It is also the first in the industry to create GIS solutions for the entire industrial chain. Sky Digital Technology Co. is a platform-level product, technology and service provider with core competitive advantages in the field of information technology.
Geological disasters are classified into 12 types and a total of 48 sub-types in China. Among all geological disasters, collapses, landslides, and debris flows are the most serious. They are widely distributed, catastrophic, destructive, hard to predict, and prone to cascading effects. They cause direct harm to residents’ lives and property, and also indirectly affect social stability, cause ecological deterioration, and aggravate other various natural disasters.
In China’s “13th Five-Year Plan” for the construction of a national emergency response system, prevention and preparation have long been emphasized as the most effective means to deal with geological disasters. Thus, it is imperative to build a professional monitoring IoT platform for geological disasters.
2. Use-cases & Pain-points Analysis and Solution Selection
An IoT monitoring solution should generally have the following five functions:
- Data Collection:
- Devices may transmit data at different frequencies and business requirements might change and the frequency of collection and transmission may change. An IoT monitoring solution has to automatically adapt to these changes without requiring manual configuration or changes.
- The solution should be able to monitor and notify about anomalous data.
- Data Transmission:
- IoT solutions should automatically adapt to network, power or other outages. These could be due to weather. For e.g. hilly areas are often cloudy and rainy and prone to outages. A solution should be able to deal with missing data points and should also have high-availability built-in.
- Data Storage:
- A robust IoT solution should be easily configurable to store historical and real-time data in respectively optimized storage that balances efficiency of access with cost of storage. At the same time, it should make querying historical and real-time data transparent to the user.
- Data Analysis:
- Query latency needs to be very low and a solution should be flexible to accommodate various devices and have device-based policies. For AI related to monitoring and predicting geological disasters, a solution should be able to adapt to various AI models.
- Early Warning and Forecast: It is better for a solution in this case to err on the side of false positives. False negatives would be unacceptable.
With the above main requirements in mind, the first thing to be solved is the storage and analysis of massive amounts of time series data with extremely high performance requirements for writing and querying data. Traditional relational databases have been unable to meet real-time writing and high-performance querying requirements. Our professional monitoring platform for the Internet of Things for Earth Disasters was developed in 2018. At that time we had selected Oracle as a large-scale Enterprise RDBMS. As more and more devices & sensors were connected and the amount of data was increasing by leaps and bounds, the system was becoming slower. This was highly apparent during the rainy season when the frequency of sensor data collection was increased to the second or millisecond level, data storage was often blocked, and the efficiency of the system was extremely low. From 2019, we began to investigate various time series databases. We found that TimescaleDB and TDengine were more suitable for our requirements. We then looked into indicators such as data writing, querying, and difficulty in getting started for TimescaleDB and TDengine. The following is a comparison and analysis of the two databases from the commonly used application scenarios using the hourly rainfall data of rainfall stations in the province in the past 10 years as the test data.
- The test machine configuration information is as follows:
- Comparison results of historical data batch storage scenarios:
- Data file size comparison results after storage:
- Comparison of common query scenarios:
- Query the cumulative rainfall (10-30 days) of all stations in the province, group by station, and summarize the cumulative rainfall
- Query the annual data of a single site, all sites in districts and counties, and summarize statistics by year, month, day, and hour (2138 rainfall sites in the province)
Conclusion: From the perspective of storage, compression ratio and query latency, TDengine is a handsdown winner. A more comprehensive comparison is shown below.
|Time series storage engine||✓||✓|
|Open source storage engine||✓||✓|
|Open-source cluster solution||✓||✓|
|Traditional relational database query support||✓||X|
|Data compression ratio||X||✓|
|Aggregate querying functions (sum, count, min, max, avg)||✓||✓|
3. Database Design and Application
The architecture of the data acquisition part is shown below.
We implement storage using the hierarchical capabilities of TDengine, both in terms of space and time. For the same type of sensor, we use a super table to represent it. For each sensor, we create a sub table for it.
Build the Database and Schema
- Create database
The parameters shown below are:
- Data is retained for 20 years
- Time span in days for a datafile to keep data – 90 days
- Vnode has 30 memory blocks (increased depending on machine memory).
- Updates are allowed i.e. Time-stamped data can be updated.
The database building statement is as follows:
CREATE DATABASE IF NOT EXISTS geohazard_monitor KEEP 7300 DAYS 90 BLOCKS 30 UPDATE 1 ;
2. Create a hypertable Create a super table for each sensor type. Taking the water pressure gauge as an example, assuming that the sensor type code is 201, the super table is named m_201, and other sensor super tables are named m_xxx (sensor type code). We designated the acquisition time as t20, and synchronous storage time as t30. The super table contains 2 tags, and the sensors are coded and numbered (note that the tag value of each sub-table is the same). The statement to create a super table is as follows:
CREATE STABLE IF NOT EXISTS m_201 (t20 TIMESTAMP, t30 TIMESTAMP ,v_water FLOAT) TAGS (sensor_code BINARY(20), sensor_id INT);
3. Data storage The super table is used as a template and the subtable naming rule is m_XXX (sensor code). The insert statement is as follows. Note that TDengine will automatically create the subtable if it doesn’t exist.
INSERT INTO m_05162300008 USING m_201 TAGS ('05162300008', 2) VALUES ('2021-01-01 00:00:00.000','2021-01-01 00:00:00.001', 0.02);
4. Queries When performing a time window (interval) avg aggregation rolling query, the time window unit can be b (nanosecond), u (microsecond), a (millisecond), s (second), m (minute), h (hour), d (day), w (week), n (calendar month) and y (calendar year). The statement for querying the average water content of each day is as follows.
SELECT AVG(v_water) FROM m_05162300008 WHERE t20 >= '2020-01-01 00:00:00.000' AND t20<'2020-01-03 00:00:00.000' interval(1d) ;
For our Java based application, we chose the JDBC-RESTful for our development environement, and the JDBC-JNI interface for client installation in our production environment. Compared to the RESTful interface, JNI has performance advantages and maximizes the performance of TDengine. However the RESTful interface has the advantages of being light and easy to debug. The switch from JDBC-RESTful to JDBC-JNI needs only a simple modification of the configuration file as shown below.
For this project, the important and highly relevant features of TDengine are:
- Using the super table template, data is quickly stored in the database, and sub-tables are automatically created.
- Sliding window statistics function.
- Very high performance with very low resource requirements and this leads to substantial energy savings and emissions reductions.
- Development and deployment are very easy and convenient.
As we have seen, TDengine shows extremely high read and write performance and data compression capabilities.The aggregation query speed is very fast and we are able to reduce resource use and costs. Novel concepts in TDengine such as super table and sub table along with improved storage for labels and metadata, and built-in statistical functions are very suitable for IoT big data application scenarios. TDengine has the potential to be the Oracle of IoT Big Data with improvements and new features and I wish it well.