TDengine Powers Vibration Data Analysis and Fault Prediction at Hydroelectric Plant

Chait Diwadkar

November 4, 2024 / Case Studies

Highlights

TDengine ingests operations data for a hydroelectric plant monitoring system at 10,000 Hz without significant delay.
Queries on this massive, high-frequency dataset return within 1 second.
The 1.7 TB of data generated per day by the monitoring system is compressed to 30% with TDengine’s storage engine.

In hydroelectric plants, it’s common to install systems that monitor and assess the operational status of key equipment based on metrics like vibration. Fault recorders are also installed on turbine generator units to capture waveforms and actions when equipment is not operating normally.

A hydroelectric operator was dissatisfied with its current monitoring solution because it failed to provide a comprehensive, multidimensional safety assessment. Furthermore, its fault recorders captured large amounts of operations data, but the company was unable to make full use of this valuable information.

Key Challenges

The operator’s requirements for the new monitoring system were to include all power system components in the plant and to store high-frequency raw signal data. With access to this raw data, they would be able to deploy new applications and perform advanced analytics.

The sampling rate at the plant was 10,000 Hz, generating 10,000 floating-point data records every second. Considering that approximately 500 signal channels were in operation and each data record would occupy 4 bytes, the data ingestion at the plant would be 1.7 TB per day. This high volume of data would require specialized hardware and software to store and process.

Database Selection

In the new monitoring system, the plant operator defined three key performance indicators for the database component: ingestion performance, query efficiency, and storage capacity. TDengine was tested against these indicators as follows:

TDengine was deployed in single-node mode with 10 data tables, each with 50 columns. A simulator was then developed that inserted 10,000 rows per second into each of the 10 tables. The following SQL statement was then executed continuously:

SELECT count(*) FROM tablename WHERE ts > now - 1s;

This query is intended to check whether data generated within the past second had been successfully ingested into the database. If the database were unable to meet ingestion requirements, unwritten data would gradually accumulate, and recent data would be missing in queries. After running the test for several days, it was verified that the latest data from the past second was always available, indicating that there was no accumulation of unwritten data and that the database was handling ingestion without significant delay.

The data load was then doubled to 10,000 rows per second, and the results remained the same, confirming that the ingestion performance of TDengine was acceptable for this business scenario.
To test query efficiency, random queries were run covering millions of records within a table. It was found that all queries completed in under a second, demonstrating that TDengine’s query performance also met the operator’s requirements.
Lastly, storage capacity was evaluated. It was estimated that 1.7 TB of data would be generated daily under normal operation. The size of the data directory in TDengine was checked after 24 hours of ingestion had occurred, and the difference between the estimated 1.7 TB and the actual size would determine TDengine’s storage efficiency. It was found that the data size in TDengine was approximately 30% of the expected 1.7 TB, indicating that TDengine’s data compression capabilities were up to the task.

Given the strong performance across these three indicators, the hydroelectric operator deployed TDengine as their time-series database for this project.

User Experience

TDengine was successfully deployed and performed as expected in production at this operator, supporting further analysis and application of raw signal data. The operator was particularly pleased with TDengine’s high performance and ease of use both to ordinary users as well as developers.

High performance: Compared with the row-oriented relational databases also in use in the project, the column-oriented design of TDengine was considered much more suitable for the data being processed. All data could be consolidated into a single table, and empty columns within rows have no performance impact. Different data types can also be organized within the same schema, though they may exist across multiple tables in practice, and then aggregated into a large table to enhance query efficiency.

User- and developer-friendly: The R&D team noted that TDengine’s C++ and Java client libraries were straightforward and easy to use, and its standard SQL support was a major factor in eliminating the learning curve for users. Deploying TDengine added no significant workload or complexity to the project.

Future Plans

In this project, only high-frequency, continuous sampling data was stored in TDengine, while low-frequency time-series data was stored in relational databases like MySQL. Although the frequency of this discrete data was as low as one data point every three minutes, the unique characteristics of industrial data make traditional row-oriented databases unsuitable for storing it. Going forward, it is planned to move all time-series data storage to TDengine. The operator also plans to leverage TDengine’s technical features to redesign its data model for further improvements in efficiency for its use case.

Chait Diwadkar
Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.