Using TDengine for Distorted Waveform Analysis and Fault Prediction in Hydropower Plants

TDengine Team

December 23, 2021 / Energy

Hydropower plants are already equipped with computer-based monitoring systems for production operations and unit stability analysis, used to track the operating condition of critical equipment through indicators such as vibration and air gap. In addition, turbine-generator units are fitted with fault recorders that capture waveforms and switching actions when abnormalities or faults occur.

However, these monitoring systems evaluate unit conditions from limited perspectives and do not provide a comprehensive, multi-dimensional view of operational safety. At the same time, the large volume of electrical and protection-related event data recorded by fault recorders has yet to be fully analyzed, leaving much of its potential value untapped.

Key Challenges

The most fundamental—and most critical—requirement of this project was to bring all electrical system components across the entire plant into a unified monitoring system and to store high-sampling-rate raw signal data. With access to raw data, a wide range of advanced analytical applications can then be performed.

In this case, the signal sampling rate is 10 kHz, meaning 10,000 floating-point data points per second. Across the plant’s electrical system, there are an estimated 500 signal channels, resulting in approximately 1.7 TB of data per day (10,000 × 4 bytes × 500 × 24 × 60 × 60).

This massive data volume introduced two major challenges:

Data transmission from monitoring devices to the system servers
Data storage at scale

The first challenge—data transmission—was resolved relatively quickly by leveraging Shenzhen Shuanghe’s strong hardware R&D capabilities, using direct hardware-based data transmission links.

The second challenge—data storage—presented two possible approaches: database storage or file-based storage. Based on experience, it was clear that storing such massive volumes of data in traditional mainstream databases would be difficult to sustain. File-based storage, on the other hand, is a common approach for this type of power-system data.

We proposed the file-storage option to the customer, but it was ultimately not selected. On one hand, storing continuous, large-scale raw sampling data as files would introduce significant limitations for downstream data applications. On the other hand, the customer was looking for a solution with a higher degree of innovation.

Database Selection and Validation

Since the customer rejected the file-based storage approach, we were pushed back to considering a database-based solution. This led us to explore newer database technologies.

After repeated comparisons, the strong performance metrics of TDengine, the time-series database developed by Taos Data, stood out. We focused on three core performance indicators that mattered most for this project: write throughput, query efficiency, and storage capability. Before formally adopting TDengine, we carried out validation tests against these three criteria.

For testing, we created 10 tables, each with 50 data columns, and deployed the database in single-node mode. We then developed a simulation program that continuously inserted 10,000 rows per second into each table, resulting in sustained high-ingest workloads across all 10 tables. At the same time, SQL queries were executed to evaluate database behavior under concurrent write and query pressure.

SELECT COUNT(*) FROM tablename WHERE ts > now - 1s;

We used this setup to verify whether data generated within the current one-second window was being written to the database in a timely manner.

If the database were unable to keep up with the write load, data would begin to accumulate, and the most recent data would no longer be queryable. To check for this issue, we ran the system continuously for several days and then executed the same SQL queries. The results showed that the latest data from the most recent second was still available in the database, indicating that any backlog remained within one second. In other words, the data was being written in near real time, with no noticeable delay.

We then doubled the data generation rate and repeated the test. The results were the same, further demonstrating TDengine’s strong write performance.

Query performance was relatively straightforward to validate. We randomly queried hundreds of thousands of records from tables containing tens of billions of rows, and all queries completed in under one second. This confirmed that TDengine also delivers excellent query efficiency.

Finally, we evaluated storage efficiency. Based on our calculations, the system generates approximately 1.7 TB of data per day. Using the du -sh * command to check the TDengine data directory after 24 hours, we found that the actual storage increase was only about 30% of 1.7 TB. This demonstrated TDengine’s strong data compression capabilities.

Based on these outstanding results across write performance, query efficiency, and storage efficiency, we quickly decided to formally adopt TDengine TSDB for the project.

TDengine in Practice

After putting TDengine into production, we found its C++ and Java APIs to be clean and intuitive, and its SQL syntax closely aligned with standard SQL. As a result, the learning curve was almost zero, and adopting TDengine did not introduce any additional workload or complexity to the project.

In the end, TDengine was successfully deployed in the production environment. Its real-world performance matched our validation tests, delivering excellent write throughput and fast query response times. This solid foundation enables the system to support deeper analysis and advanced applications based on raw signal data.

In this project, we stored only high-frequency, continuous sampling data in TDengine. Other low-frequency, discrete time-series data was not written to TDengine and instead remained in traditional databases such as MySQL. Although this type of discrete time-series data is generated at much larger intervals (for example, once every three minutes), the nature of industrial data makes it particularly ill-suited to traditional row-based databases, where data modeling becomes extremely difficult.

For example, in an industrial production system, different devices generate very different types of data: some produce voltage and frequency data, others generate current and power data, while still others produce pressure and temperature data. If all these data types are stored as separate columns in a single table, most rows will contain a large number of empty columns, which significantly degrades database performance.

An alternative approach is to store each data point as a separate row. This is the design we adopted in practice, where each row contains only two meaningful fields: timestamp and value. However, this design results in massive duplication of timestamps and causes the total number of rows to grow by hundreds or even thousands of times, which again has a serious impact on database performance. Its only real advantage is flexibility in schema extension.

As a result, it is extremely difficult to design an optimal solution for industrial data using traditional row-based databases.

Column-oriented databases like TDengine, however, are well suited to these challenges. We can use wide tables to store all data in a single table, and because TDengine is column-based, empty columns have virtually no impact on performance. Alternatively, different categories of data can be modeled within the same logical schema (even if implemented as multiple physical tables), and BigTable-style techniques can be used to aggregate data across tables into a single large table, significantly improving query efficiency.

Future Plans

TDengine is purpose-built for time-series data. Looking ahead, we plan to gradually migrate discrete time-series data into TDengine as well. By leveraging TDengine’s technical capabilities, we will redesign the data model to dramatically improve both write and query performance for this category of data.