Supporting SQL with extension to facilitate time series data processing

Many time series databases and traditional real-time databases use proprietary query languages and some only provide APIs for querying. The TDengine team chose SQL as the query language from day one since it the most popular query language and is familiar to many programmers. It’s ubiquity reduces the learning curve and reduces application migration costs. At the same time TDengine has extended the standard SQL syntax to facilitate analysis and processing of time series data.

A simple comparison of queries with InfluxDB

InfluxDB has adopted the Flux query language since version 2.0. Let us take a simple example to let everyone know how easy it is to use SQL versus Flux. Suppose we want to query the average voltage per minute of all smart meters in Chicago in the past hour from the database power, the query statement of Flux will be:


from(bucket:"power")
|> range(start:-1h)
|> filter(fn:(r) =>
r._measurement == "smeter" and
r._field = "voltage" and
r.location == "chicago"
)
|> aggregateWindow(every: 1m, fn: mean)

If written in TDengine, the query statement will be:


select avg(voltage) from power.smeter where ts > now -1h and location = "chicago" interval(1m)

From the above, it’s easy to tell which one is more user friendly.

Data Ingestion:

InfluxDB, Prometheus, OpenTSDB and many other time series databases support schemaless data ingestion which makes it easy for developers not familiar with databases, to get started. TDengine adopts the traditional relational model, which requires users to define a schema. While this disciplined approach imposes a certain amount of difficulty at the beginning, it pays off in the long run. But since v2.4, TDengine does also support schemaless data ingestion which makes TDengine flexible enough to accommodate those getting started as well as advanced developers.

In addition, TDengine now supports data ingestion using InfluxDB’s LINE format. Existing Telegraf or other data collection applications do not need to modify their code but instead they can simply change the URL endpoints to those of the TDengine clusters. Furthermore, TDengine also supports multiple protocols of OpenTSDB, including Telnet, JSON and other data ingestion protocols. For details, please check Efficient Data Ingestion.

SQL Extension

The analysis of time series data has its own characteristics and requires some special analysis functions, which are not supported by traditional relational databases. For the convenience of developers, TDengine provides the following SQL extensions:

  • interval: TDengine aggregates the collected data by time interval, such as calculating the average voltage every 5 minutes
  • fill: At a specific time point, when there is no data, TDengine can perform interpolation calculation based on the data before and after this time point and provide it to the application.
  • sliding: Sliding window. TDengine can perform query calculation at specified time intervals.
  • last_row: Returns the last non-NULL value of a column in a table. This is very important for IoT applications, because the current state of the device needs to be quickly obtained.
  • tavg: Time-weighted average, which is often used in the Industrial Internet.
  • spread: count the difference between the maximum and minimum values ​​of a column.

TDengine 3.0 will also provide some unique time series analysis functions for forecasting, such as moving average.

For more details on SQL queries, check TAOS SQL.