Migrating from OpenTSDB for a Simplified IoT Platform

Zhongyuan Ai (RisingStar IoT)
Zhongyuan Ai (RisingStar IoT)
/
Share on LinkedIn

RisingStar is a SaaS platform providing versatile solutions tailored for IoT, which covers closed-loop services from sensors, edge gateways, cloud platforms to applications. To fit in various usage scenarios, RisingStar designed different applications such as smart water management platform, smart emergency response system, smart environment monitoring system, smart agriculture system, smart home platform, etc. to meet the requirements of our customers.

Pain Points Caused by OpenTSDB

At first, RisingStar employed OpenTSDB to handle time-series data, while OpenTSDB exposed the following pain points:

  • The deployment of OpenTSDB relies on third-party software such as HBase, HDFS, and ZooKeeper, which leads to higher costs in deployment, management, and maintenance.
  • When a problem occurs in the process of development or testing, it requires troubleshooting not only OpenTSDB but also HBase, HDFS, and ZooKeeper to locate the problem.
  • Two or more copies are commonly required for an IoT scenario. Within the clustered deployment of OpenTSDB, the relations between servers are extremely complex. Thus, the more servers are deployed in the cluster, the higher management & maintenance costs will be caused.
  • As mentioned above, the management & maintenance costs of OpenTSDB are fairly expensive. On the other hand, the deployment of OpenTSDB requires more hardware resources, which will cost extra overhead on hardware.

Looking for an Upgrade Solution

To resolve the pain points that baffle our business, our team was eager to find a better solution to replace OpenTSDB. Regarding our requirements and usage scenarios, we came up with the following desired qualifications:

  • Open source and free for commercial use
  • A stable, high-performance, and professional time-series database (TSDB) with convincing use cases
  • An active DevOps community that provides Q&A and quick responses
  • Consistently releasing new versions as well as offering IT operations & support services
  • High writing performance (support over 100,000 inserts per second) and reliable scalability
  • A lower total cost of ownership and overheads on hardware
  • Support the cluster mode and scaling out

According to the research, three prevailing time-series databases were on our list:

  • InfluxDB: InfluxDB ranks first in the DB-Engine Ranking for Time-Series DBMS and satisfies most of our requirements. However, its enterprise edition is closed-source software, hence InfluxDB cannot be our first choice.
  • TimescaleDB: TimescaleDB ranks top in the DB-Engine Ranking for Time-Series DBMS and nearly meets all qualifications we required, while it is just a time-series data processing plugin based on PostgreSQL (RDBMS) and cannot provide a well-supported cluster version as well as scaling out.
  • TDengine: TDengine has a well-supported cluster version and excellent scalability. Besides, it proves its astonishing writing & query performance in real usage scenarios.

Considering the qualifications referred above, TDengine undoubtedly became our optimal selection.

Data Modeling

Regarding the data modeling must fit in the existing system architecture and functions, it should be in compliance with the following requirements:

  • “schema-free”: OpenTSDB is a schema-free database, we want to keep such a schemaless model so that data ingestion and queries can be executed without “CREATE TABLE”.
  • Single-column Model: The architecture of the IoT platform should be able to support various usage scenarios rather than a specific usage scenario, hence flexible configuration settings are needed. For example, there are two metrics on the level sensor — liquid level and distance, liquid level data is the only metric needed in scenario A while distance data becomes the only required metric when it comes to scenario B, accordingly single-column model (every single metric is a record and the same metrics will be stored in a single column) should be supported to tackle such a necessity in the alike situations.

However, TDengine is not a schema-free database, which means it requires to “CREATE TABLE” before writing and querying. Do we have to give up TDengine? Or is there a solution? Luckily, we found an approach through comprehensive research and technical support from TDengine DevOps team —— three key features of TDengine helped us out of this problem.

Powerful STable

STable is a unique and powerful feature of TDengine. STable can be created to store the total data for a particular type of data points (sensors) to facilitate aggregations among multiple tables, while it does not fit in our requirement of schema-free. To deal with this problem, our team designed a STable to include all types of data points (sensors). In this way, we only need to execute “CREATE STABLE” for one time in the deployment, then no more “CREATE STABLE” is required in the further operations.

Single-column Model

TDengine supports both single-column model and multi-column model, while TDengine DevOps team recommends the multi-column model due to the consideration of higher ingestion performance and more cost-effective storage. But eventually, we decided to use the single-column model for the implementation of schema-free. The schema is like this:

As shown in the statement, metric data includes “ts timestamp” and “val float”. The “metric_code” in the TAGS is used to facilitate filtering.

Automatically “CREATE TABLE”

“table” in TDengine is a concept different from “STable”. Each table stores the total data of a data point (sensor). Meanwhile, TDengine supports auto “CREATE TABLE” when ingesting data, the instance is like:

Through an “all-in-one” STable, single-column model, and automatically “CREATE TABLE”, we successfully configured TDengine to be a schema-free database. And in this way, we were able to use TDengine without altering the existing architecture.

Code Modification

In the existing architecture of the RisingStar IoT platform, there is a program called “Time-Series Data Service” to provide services including data processing, writing, and query. Any other modules must access the database via Time-Series Data Service.

Such a design makes the upgrade much easier. We only need to modify the codes of Time-Series Data Service to connect TDengine via a JDBC driver. In this way, TDengine can integrate with the RisingStar IoT platform without any changes to applications and existing functions.

Field Application

Migrating all historical data to TDengine is a necessity in the upgrade project. To meet this requirement, our team designed and optimized a data migration tool to migrate all historical data from OpenTSDB to TDengine. Based on this tool, we divided the process of field deployment & testing into the following three steps:

Step 1: Data Migration

In this step, data will simultaneously “write” into OpenTSDB and TDengine, while all query requests will send to OpenTSDB as it stores all real-time data and historical data. At the same time, the data migration tool will start working to migrate all historical data from OpenTSDB to TDengine. After migration, it comes to step 2.

Step 2: Testing

During the testing, OpenTSDB and TDengine are still running in a parallel manner, but all queries will drive to TDengine after data migration. The testing will last about two weeks to examine whether any problems or bugs occur in the actual use.

Step 3: Post-Deployment

If the whole data processing pipeline runs smoothly and stably in the testing stage, we will close OpenTSDB and retrieve all occupied resources. After that, we will focus on monitoring the running status of the whole system.

Through the three steps above, TDengine formally replaced OpenTSDB and became the data-processing engine in the RisingStar IoT architecture. After the deployment, the TDengine DevOps team will constantly provide professional technical support, which will greatly minimize the management & maintenance costs.

Comprehensive Improvements Brought by TDengine

TDengine has profoundly fueled the RisingStar IoT platform because it only occupies 1/5 hardware resources of the solution provided by OpenTSDB and demonstrates much higher performance & dominating advantages:

  • Stress testing does not require to conduct on a high-performance server.
  • TDengine DevOps team offers reliable, professional, and quick responses when I am asking for technical support.
  • Lite installment package makes deployment & maintenance much easier and accessible. Besides, developers can deploy TDengine on PCs, which won’t occupy extra resources.
  • Easy-to-go deployment renders the deployment can be finished within several minutes in Docker.
  • Lower management costs on operations & management
  • Cost-effective overhead on storage and hosts (only 1/5 of former solution)

This upgrade project has been well-implemented. Although our team was stuck in the data modeling for a while, eventually our team had successfully configured TDengine to be a schema-free database and satisfied all requirements of the RisingStar IoT platform with the support from TDengine developers.

Conclusion & Suggestions

Right now, TDengine stores the total time-series data and supports all usage scenarios for the RisingStar IoT platform.

Since the conversion from OpenTSDB to TDengine, the occupied hardware resources have lessened 5 times than the previous, meanwhile, the costs have been greatly reduced as well as the performance has obtained impressive enhancement, especially in terms of development, testing, and operations & maintenance.

Moreover, many exciting features of TDengine still await us to explore, such as “UNION”, “GROUP BY”, “JOIN”, and aggregated query. Concerning the excellent performance of TDengine on RisingStar IoT Platform, we will consider employing TDengine to replace Hadoop in more big data scenarios.

Additionally, in light of the problems encountered in the deployment, we got two suggestions for TDengine:

  • JDBC-JNI is not pure Java and relies on a DLL file to execute implementation, which increases the difficulty of installment & deployment. Although JDBC-RESTful can handle this problem, integrating a RESTful connector will relatively lower the performance. Therefore, we believe a pure-Java JDBC driver is necessary for TDengine.
  • TDengine client only supports a command-line interface, which is not very friendly to developers (especially some novice developers who are familiar with graphical user interfaces). Even if two GUIs are contributed by the development community, we still recommend the TDengine DevOps team develop an official version.

In the end, I’d like to deliver my gratitude to TDengine DevOps team for their unreserved assistance and support.