Can Typical Time-Series Databases Replace Data Historians?

Jeff Tao
Jeff Tao
/
Share on LinkedIn

The time-series database (TSDB) market has grown rapidly over the past two decades: over 40 products have been released since RRDtool in 1999, and according to DB-Engines, time-series databases have experienced the second-highest growth in popularity since 2013. As purpose-built time-series solutions become more full-featured, support more use cases, and offer higher performance, their adoption in various fields is likely to continue increasing for the foreseeable future.

Why Industry Hesitates

At present, time-series databases have already become core components of the data infrastructure in some markets: it’s hard to imagine DevOps and IT monitoring environments without Prometheus or InfluxDB, for example, and kdb+ is the gold standard in the fintech industry. But the same cannot be said for traditional industries such as manufacturing and energy. Time-series databases have opened up a wealth of new possibilities in the IT sector and are clearly superior to existing solutions – often hacked together by internal teams or dependent on relational databases – but industrial enterprises have been using data historians with advanced time-series data processing capabilities since the 1980s.

While there has been some interest in time-series databases among these enterprises, adoption has been slow. This is because typical time-series databases like InfluxDB are not comprehensive solutions like traditional data historians and cannot fully meet industrial customers’ data processing needs in terms of performance or functionality. Industrial enterprises are accustomed to consolidated solutions that cover all aspects of data processing, from ingestion to analytics, and the time-series databases on the market today are only a single part of such a solution — not a potential replacement for the system as a whole.

In addition, while time-series database vendors have worked hard to improve performance, some typical industrial use cases — for example, calculations on real-time data — have not been addressed, and industrial customers often find that traditional historians actually query and analyze their data faster than modern TSDBs.

The fact is that the current generation of time-series databases were designed for tech companies by tech companies, and the pain points and particular requirements of traditional industries were always an afterthought at best. The key factors preventing industrial customers from deploying traditional time-series databases are described as follows.

Ingesting Data

In an IT monitoring environment, it’s often sufficient to install a data collector like Telegraf on the servers or containers that you want to monitor. Your time-series database can then ingest data from the collector. However, industrial environments are much more complex, with PLCs and other systems potentially running multiple proprietary or legacy data protocols at each site.

Traditional data historians like PI System have strong support for ingesting data over a wide variety of protocols, making it easy to get data from an industrial site into the system. Typical time-series databases in use today, however, are unprepared for complex environments, and data cannot be ingested without significant development effort.

Contextualizing Data

Context is invaluable when it comes to industrial data, but traditional time-series databases are unfortunately ill-equipped to maintain it. In products like InfluxDB, for example, adding or modifying a tag creates an entirely new time series. And even cutting-edge solutions like TimescaleDB are unable to support hierarchies that have been in place for decades in traditional historians like PI System. Without a data platform that can support contextualization, even the latest industrial analytics solutions have no way to provide real insight into operations.

A related concept that also affects industrial customers is extract, transform, and load (ETL). Most time-series databases concern themselves with high-performance querying of historical data but neglect to consider the quality of that data. However, data governance is highly important for industrial customers hoping to achieve digital transformation and implement new technologies into their infrastructure.

Analyzing Data in Real Time

For industrial enterprises to gain insight into their operations, real-time analytics is a necessity, but typical time-series databases provide only a time-driven option, known as continuous query, that produces new results at set intervals.

There are many situations in which continuous query is intrinsically limited: preprocessing and transformation in scalar functions, session windows, and low-latency use cases like fault detection are all examples of scenarios where continuous query is not up to the task. In an industrial environment, data must be available for analysis in real time.

Although some architects may attempt to avoid the issue by deploying Spark or Flink in addition to a time-series database, this greatly increases system complexity and is a needless drain on the typically limited IT resources at industrial enterprises — most do not have teams of systems and network administrators ready to deploy and debug this critical component. For industrial customers, the data platform itself must be responsible for enabling real-time analytics.

Quickly Accessing New Data

Industrial scenarios demand that the system return the latest data to the application as soon as possible. For example, a fleet management system always needs to know the current GPS position of each truck in the fleet. And in a smart factory, the system always needs to know the current state of every valve and the current reading of every meter.

In fact, historical data is not even accessed during many queries in industrial use cases. However, time-series databases tend to store historical and real-time data together, leading to unacceptably slow query responses on real-time data.

A typical time-series database solves this problem by writing new data points into the database and into a caching product, such as Redis, at the same time. While this design works, it necessitates increased system complexity and higher cost of operation. Instead, an industrial data solution must provide caching as a built-in first-class element.

Exporting Data in Real Time

To have an impact on operational efficiency and enable data-driven decision-making, external applications and algorithms need access to data as soon as it arrives at the system. Especially for emerging technologies like AI and ML, simply polling the database is not acceptable; an industrial data platform must push data to these applications.

While time-series database are adept at storing time-series data, it can often be difficult for them to distribute that data in real time. IT environments often add Kafka to the system, but this is not a technologically or economically viable solution for the OT world. Sharing or distributing data to applications must be handled by the data platform itself and cannot rely on non-real-time concepts like views.

Rearchitecting the Time-Series Database for Industry

TDengine is not a typical time-series database: it is a time series database purpose-built for Industry 4.0 and the Industrial IoT. In addition to its high-performance cloud-native TSDB, it delivers all necessary components for industrial data processing integrated within a single product and accessible through familiar SQL statements. These components include:

  • Connectors for traditional historians like PI System and Wonderware Historian as well as protocols such as MQTT and OPC
  • ETL functionality to enable good data governance and implement a unified namespace
  • Robust support for contextualization, allowing the creation, deletion, and modification of multiple tags and hierarchies on any time series
  • Real-time analytics capabilities through stream processing
  • Fast access to new data with built-in caching
  • Real-time exporting of data via data subscription

At the same time, TDengine is not intended as a one-to-one replacement for traditional historians. Importantly, instead of attempting to provide auxiliary components such as analytics and visualization, TDengine interoperates with an open ecosystem of products such as Seeq and Grafana that offer best-in-class solutions for every element of the industrial data architecture. This is why TDengine represents the next generation of data historians: it delivers the comprehensive data processing, contextualization, and performance of a traditional historian combined with the openness, flexibility, and scalability of a modern time-series database.

  • Jeff Tao

    With over three decades of hands-on experience in software development, Jeff has had the privilege of spearheading numerous ventures and initiatives in the tech realm. His passion for open source, technology, and innovation has been the driving force behind his journey.

    As one of the core developers of TDengine, he is deeply committed to pushing the boundaries of time series data platforms. His mission is crystal clear: to architect a high performance, scalable solution in this space and make it accessible, valuable and affordable for everyone, from individual developers and startups to industry giants.