Characteristics of Time-Series Data

Time-series data has always been very important and useful in various industries but is of increasing significance with the rapid growth in IoT. The number of IoT connected devices is projected to increase to 43 billion by 2023 according to a Gartner report. Time-series data and analysis has been used for forecasting in sectors ranging from astronomy, finance, utilities, agriculture, census, supply chain, manufacturing to environment and many more. Several platforms and libraries have sprung up that can persist and analyze time-series data. For example, TDengine, a popular TSDB on GitHub, is unique in that it provides an open-source database engine that is built from scratch specifically for and is highly optimized for IoT time-series data, with off-the-shelf scalability, and SQL as the query language.

In this article we will look at some inherent characteristics of general time-series as well as characteristics of IoT time-series data.

As the name implies, time is effectively the primary index of time-series data and time-series data describes how things change over time. Statistical and increasingly, machine learning methods are used to characterize and also predict the behavior of systems over time. Forecasting, i.e., predicting the future state of complex systems based on on past and present data, is one of the most important applications of time-series data.

Time-series data in most applications is in the form of discrete (as opposed to continuous), time intervals, i.e. the difference between time-points is uniform.

Inherent Characteristics of Time-series

A trend refers to the tendency of values in a time-series to increase or decrease over time. When the time spans are long the trend is a long-term trend and if the time spans are short the trend may be a short-term trend. Whether the trend is a long-term or short-term trend would depend on the particular use-case and what is considered a long time-span in that particular scenario. To discover these trends, one needs to conduct time-series analysis.

Long-term Trends

As an example Ukraine’s population is expected to drop significantly due to the huge disparity between birth rates and death rates. On the other hand, even when population is increasing, the growth rate may be falling as is the case in several countries, such as China. World GDP has tended to grow in the long-term even though it declined due to the impact of SARS-CoV-2 in 2020 or due to the Global Financial Crisis in 2008. 

Short-term Trends

Short-term trends can be categorized into seasonal and cyclical trends. Seasonal trends usually show the same variation and typically occur over a period of less than a year and during the same time every year.

Seasonal trends can be the result of several things:

1. Social conventions/holidays – Halloween and Christmas related increases in consumer activity are quite prominent.

Characteristics of Time-Series Data
Fig 1. Interest over time in “Vera Bradley” from Google Trends. Clear seasonal spikes in July/August and December are visible.

2. Weather/climate/geography – sales of certain things goes up depending on the season. From sale of umbrellas and ice-cream to vacations, things vary by season. This variation is also affected by geography. Trends in the Southern hemisphere will be at different times from those in the Northern hemisphere. 

Check out how time-series analysis can deliver insights here:  Understanding the shifting seasonality of apparel

Cyclical trends are recurring trends but are not necessarily of a fixed period. Business cycles are short-term trends but it is not known exactly when they will start and how long they will last.

Long-term and short-term trends are often referred to as signals by data scientists because they can be derived from the data and are deterministic.

Random Fluctuations

Random variations also lead to rises or falls in values in time-series but these variations are uncontrollable and unpredictable and could be the result of just about anything. Even though these variations are random they can still be identified using mathematical techniques.

Random fluctuations are often referred to as noise because the causes of the variation are difficult to observe.

Stationarity

Time-series can also be characterized as stationary and non-stationary.

A stationary time-series is one whose statistical properties do not change with time. The values are not time-dependent. Basic properties like the mean and variance remain constant over time. Mathematically a stationary time-series will show the same probability distribution even if the time-series is split into equally sized pieces. Most forecasting methods expect stationary time-series data and so one of the first steps in time-series analysis is to transform a non-stationary time-series into stationary time-series.

A non-stationary time-series is the result of a random process that doesn’t have a consistent mean or distribution across time. For example the time-series showing new home sales in the US is not stationary. It has a trend and seasonal pattern.

Characteristics of IoT Time-Series

Time-stamps

IoT data always has time stamps and depending on the frequency of collection, the time-stamp may have various resolutions, even up to nanosecond resolution. Time-series databases should be able to write data very efficiently in order to accomodate massive numbers of devices collecting and transmitting data very frequently. TDengine has extremely high write performance resulting from innovations like “Super Tables” and efficient schema design among others.

Structured

Data from sensors and devices is structured, even if the structure may change as devices become more advanced and can gather more or different types of data. This also means that platforms designed for time-series data must be able to handle various data types, apply optimized compression for data types and efficiently query different data types. Additionally they must provide the flexibility needed to accommodate changes in schema without any additional complexity. With support for multiple data types, TDengine is able to optimize compression and reduce storage costs significantly.

Streams

IoT data is constantly streaming and so a single piece of data at a specific time-point is not necessarily important. Even if data is missing for certain timepoints this is acceptable. What is important is the aggregate trends in the data. Time-series databases must provide functions that can deal easily with streams and perform aggregation and capture statistics easily. TDengine has very convenient and advanced features such as “continuous queries” and built in lightweight publish/subscribe to enable easy computing with streams.

Stable Data Rates

The amount of data and the rate at which a time-series database receives and needs to write data is very predictable if the number of devices and frequency of collection is known. This simple fact, along with  a good time-series database with support for multiple data types, optimized compression and optimized performance for IoT, can be used for easy capacity planning and an optimized infrastructure. TDengine has great support for capacity planning.

Massive Volume

The volume of data is very massive and so time-series databases need to be very high-performance and also highly available. Ideally they provide scalability and performance features out of the box while also providing configurable features that can be used to optimize for specific applications and use cases. 

Traditional Relational Database Management Systems and general purpose NoSQL databases can handle IoT data but at the expense of high storage and computational costs. A database engine, like TDengine, that is purpose built for IoT can exploit the characteristics of IoT time-series as well as provide functions to handle the inherent characteristics of time-series. This leads to a highly functional, high-performance and highly scalable database platform that can reduce complexity, reduce storage and computing costs and provide the flexibility needed to handle IoT time-series data.