From Data Archive to TSDB: Why the Industrial Data Foundation Must Be Rebuilt

Jeff Tao

March 19, 2026 / AI-Native Industrial Data Foundation

In the previous article, we discussed the evolution of industrial data infrastructure: from data historians, to industrial data platforms, and now to AI-native industrial data foundations. This evolution is not just about adding new capabilities—it is fundamentally about rethinking the underlying architecture. At the center of this shift lies a critical but often overlooked change: the data storage layer itself is being replaced.

For decades, the Data Archive has been the core component of industrial data historians. It is responsible for ingesting time-series data from the field, compressing it, and making it available for query and visualization. This architecture has been remarkably stable and has supported industrial operations successfully for many years.

However, when viewed through the lens of industrial internet, IoT, and AI, the assumptions behind Data Archive no longer hold.

The Strengths and Limitations of Data Archive

Data Archive was designed in an era where both compute and storage were scarce. Its primary objective was to store as much data as possible using as little resource as possible. To achieve this, it adopted compression techniques such as the swinging door algorithm, which reduces data volume by discarding redundant points while preserving the overall shape of the signal.

A diagram to explain how the swinging door compression algorithm works

This approach was highly effective at the time. It allowed systems to retain long periods of historical data within limited storage constraints, while still enabling engineers to understand trends and system behavior. This efficiency was one of the key reasons why data historians became widely adopted across industries.

But from today’s perspective, this design comes with clear trade-offs. First, the compression is inherently lossy. Raw data is not fully preserved. While this may be acceptable for trend visualization, it becomes a major limitation in the AI era. Machine learning models, anomaly detection, and fine-grained analysis all depend on high-fidelity data. Losing original data points can directly impact accuracy and lead to incorrect conclusions.

Second, Data Archive does not follow an open query model. It typically does not support standard SQL, and instead relies on proprietary APIs or tools for data access. While this may work within a closed system, it becomes a significant barrier when integrating with external systems such as BI tools, modern data platforms, or AI pipelines. Additional middleware or custom integration is often required, increasing both complexity and cost.

At the architectural level, Data Archive was not designed for horizontal scalability. It is typically based on single-node or limited-scale deployment models. In an era where industrial internet and IoT are driving exponential growth in the number of tags and data points, this limitation becomes increasingly difficult to manage.

Overall, Data Archive solved the problem of efficient data storage extremely well—but it did so under the assumptions of a resource-constrained, closed-system environment. In today’s world of open, scalable, and AI-driven data systems, those same design choices are becoming constraints.

TSDB: Built for Scale, Openness, and Modern Infrastructure

Modern time-series databases (TSDBs) are built in the era of cloud computing and big data. Their goal is not only to store data efficiently, but to support large-scale processing, open integration, and future-facing data applications.

At the storage level, TSDBs typically use columnar storage combined with multi-stage compression techniques. This allows them to achieve higher compression ratios while preserving raw data. TDengine TSDB uses unique “one table per data collection point” data model to achieve even higher comrepssion ratio. This is especially important in the AI era, where model training, anomaly detection, and behavioral analysis all require complete and high-fidelity datasets.

Diagram to explain how column storage works

In terms of query capability, TSDBs commonly support standard SQL. This is not just a matter of convenience—it fundamentally changes how industrial data can be used. Engineers can use familiar query languages, and data can be directly integrated with BI tools, visualization platforms, and modern data systems without relying on proprietary interfaces. For the first time, industrial data becomes truly interoperable within a broader data ecosystem.

Architecturally, TSDBs are designed to scale horizontally. They can distribute data across multiple nodes and handle increasing data volumes by adding more capacity. This is essential in industrial environments where the number of devices and data points is growing exponentially.

TDengine distributed qrchitecture which can support one billion tags

Even more importantly, TSDBs align with modern infrastructure. They run natively on Linux, support containerized deployment, and can operate seamlessly in cloud environments. This is in sharp contrast to traditional historian systems that often depend on Windows-based or proprietary environments. In today’s cloud-native world, the ability to deploy, scale, and manage systems through automation is no longer optional—it is a baseline requirement.

TSDBs also provide open interfaces and integrate easily with data pipelines, messaging systems, and analytics tools. They are no longer isolated storage engines, but active components within modern data infrastructure.

From this perspective, TSDB is not just a better Data Archive – it is a fundamentally different class of system.

Data Archive was born in an era of resource constraints. TSDB was born in an era of data explosion.

TSDB Alone Is Not Enough: Toward TSDB + IDMP

Despite its advantages, TSDB alone is still not a complete industrial data foundation. It is a powerful data engine—but it does not solve the full problem.

The value of industrial data does not come from the data itself, but from its context. A temperature, pressure, or vibration signal only becomes meaningful when it is associated with a specific asset, process, and operating condition. Without this context, even perfectly stored and efficiently queried data still requires significant effort to interpret—and remains difficult for AI to use effectively.

This highlights a critical limitation. TSDB can answer how data is stored and accessed, but it cannot answer what the data represents, how different data points relate to each other, or what the system state actually is.

A true industrial data foundation must go beyond storage. On top of TSDB, it must provide higher-level capabilities such as asset-centric modeling, event modeling and analysis, real-time processing, engineer-friendly visualization, and context-aware analytics and AI.

This is where the next step in architecture emerges: the combination of TSDB and IDMP (Industrial Data Management Platform). In this model, TSDB provides scalable, high-performance data storage and access, while IDMP provides structure, semantics, and application-layer intelligence.

This layered architecture combines the openness and scalability of modern data systems with the contextual richness required for industrial operations. More importantly, it creates the right foundation for AI—because AI does not just need data, it needs structured, contextualized, and meaningful data.

From this perspective, TSDB is a necessary condition-but not a sufficient one. Only when combined with IDMP does it become a true AI-native industrial data foundation.

Conclusion

Data Archive played a critical role in the development of industrial systems, but it was designed for a different era—one defined by limited resources and closed architectures.

Today, industrial internet, IoT, and AI are fundamentally changing how data is generated, processed, and used. Data volume is growing exponentially, and expectations for data are shifting from storage to intelligence.

In this context, TSDB is replacing Data Archive as the new data engine. But the real transformation happens when TSDB is combined with IDMP, enabling a shift from storing data to understanding data.

This is not just a technical upgrade — it is a generational shift in industrial data infrastructure.

Jeff Tao

With over three decades of hands-on experience in software development, Jeff has had the privilege of spearheading numerous ventures and initiatives in the tech realm. His passion for open source, technology, and innovation has been the driving force behind his journey.

As one of the core developers of TDengine, he is deeply committed to pushing the boundaries of time series data platforms. His mission is crystal clear: to architect a high performance, scalable solution in this space and make it accessible, valuable and affordable for everyone, from individual developers and startups to industry giants.