As defined in Time-Series Databases: New Ways to Store and Access Data, a time series is data recorded “as measurements or observations of events as a function of the time at which they occurred.” At its most basic, time-series data is simply a measurement and the time at which it was taken. Although simple in concept, time-series data has become a key component of data analysis in many industries.
Time-series data is useful because it can answer questions about trends, patterns, and correlation over time. The ability to see how a measurement changes over a period of time enables powerful insight in a wide variety of areas ranging from business trends to the operating status of industrial equipment. In addition, as the cost of communication continues to decrease and smart devices and sensors become commonplace, more time-series data is being generated than ever before.
However, in order to make use of this data – to monitor devices, generate reports, trigger alarms, make predictions, and more – businesses need data platforms that can handle its scale. The amount of time-series data generated is growing to an extent that even traditional analysis is becoming difficult for legacy data historians and general-purpose relational databases. Instead, forward-thinking businesses are increasingly adopting the purpose-built time-series database (TSDB) as the platform for processing their time-series data.
Time-series databases are able to offer significantly higher performance than other database management systems because they take advantage of the characteristics of modern time-series data sets. These characteristics are listed as follows:
Top 10 Characteristics of Time-Series Data
- Timestamp: The generation of time-series data is triggered by a predefined timer or event, and when devices collect time-series data, a timestamp is always associated with each record. Time-series data can therefore be indexed by timestamp; the timestamp associated with each data record is the key for computing or analysis.
- Structure: Time-series data generated by devices is always structured, often having a predefined data type or fixed length.
- Stream-like nature: Time-series data from connected devices can be thought of as a data stream, being continuously collected and flowing into the database. These data streams are independent from each other.
- Stability: Although the scale is large, overall traffic in a time-series scenario will remain stable and can be predicted and calculated given the number of devices and the sampling period.
- Trend focus: Trends over time are more important than values at any specific time, and essentially the same analysis results can be obtained even if some data points are lost. For time-series data, the main challenges are storing, processing, and analyzing data sets due to their massive scale.
- High write/read ratio: In general, raw time-series data is only read occasionally by analytics software and other similar tools to generate reports and run algorithms, but it is written constantly to the database.
- Immutability: Specific time-series data records are almost never updated or deleted. Time-series data generated by devices can be considered append-only, similar to log files.
- Retention policy: In most scenarios, a lifecycle is defined for the collected time-series data, after which it is deleted to reduce storage costs. It is rare that raw data is stored forever.
- Real-time computing: To meet business requirements, data processing systems must be able to perform operations on time-series data in real time – for example, a monitoring system must be able to trigger as soon as the conditions for an alarm are met.
- Aggregation: Most queries are performed on a specified time range, not all historical time-series data. In addition, data aggregation is always needed over a time window for all or a subset of devices. Filtering a subset of devices for aggregation is mandatory in IoT applications.
In addition to making use of these characteristics, time-series databases can also save resources by eliminating mechanisms that are not necessary for its specialized use cases. For example, database transactions are not required for time-series data, unlike typical relational database use cases.