TDengine Storage and Architecture

TDengine is a light, highly efficient, single-node open-source, and IOT oriented data processing engine. As a data engine designed for IOT, TDengine has huge advantages in writing, querying, storage, etc. In this article, we will talk about the architecture and storage design of TDengine to help users to fully understand it.

TDengine architecture

There are two main modules in TDengine server as shown in Picture 1: Management module(MGMT) and Data module(DNODE). The whole TDengine architecture also includes a TDengine client module.

This image has an empty alt attribute; its file name is structure.png
Picture 1 TDengine Architecture

MGMT module

The MGMT module deals with the storage and querying on metadata, which includes information about users, databases, and tables. Applications will connect to the MGMT module at first when connecting the TDengine server. When creating/dropping databases/tables, the request is sent to the MGMT module at first to create/delete metadata. Then the MGMT module will send requests to the data module to allocate/free resources required. In the case of writing or querying, applications still need to visit MGMT module to get metadata, according to which, then access the DNODE module.

DNODE module

The DNODE module is responsible for storing and querying data. For the sake of future scaling and high-efficient resource usage, TDengine applies virtualization on resources it uses. TDengine introduces the concept of the virtual node (vnode), which is the unit of storage, resource allocation and data replication (enterprise version). As is shown in Picture 2, TDengine treats each data node as an aggregation of vnodes. Each vnode contains all the data in a set of tables. Vnodes have their cache, directory to store data. Resources between different vnodes are exclusive with each other, no matter cache or directory. However, resources in the same vnode are shared between all the tables in it. By virtualization, TDengine can distribute resources reasonably to each vnode and improve resource usage and concurrency. The number of vnodes on a dnode is configurable according to its hardware resources.

This image has an empty alt attribute; its file name is vnode.png
Picture 2 TDengine Virtualization

Client module

TDengine client module accepts requests (mainly in SQL form) from applications and converts the requests to internal representations and sends it to the server side. TDengine supports multiple interfaces, which are all built on top of TDengine client module.

Writing process

Picture 3 shows the full writing process of TDengine. TDengine uses Writting Ahead Log strategy to assure data security and integrity. Data sent by clients, after data cleansing, is written to the commit log at first. When TDengine recovers from crashes caused by power loss and other situations, the commit log is used to recover data. After writing to the commit log, data will be written to the correct vnode cache. Then a success message is sent to the application. Two mechanisms can flush data in the cache to disk for persistent storage:

  1. Flush driven by timer: There is a backend timer that flushes data in cache periodically to disks. The period is configurable.
  2. Flush driven by data: Data in the cache is also flushed to disks when data exceeds a threshold. Flush driven by data can reset the timer of flush driven by the timer.
This image has an empty alt attribute; its file name is write_process.png
Picture 3 TDengine Writing Process

A new commit log file will be opened when the committing process begins. When the committing process finishes, the old commit file will be removed.

Metadata storage

Metadata includes the information of databases, tables, etc. Metadata are saved in /var/lib/taos/mgmt/ directory by default. Metadata files are appended only, even drop operation adds a delete record at the end of the file.

/var/lib/taos/
+--mgmt/
+--db.db
+--meters.db
+--user.db
+--vgroups.db

Data storage

Data in TDengine are sharded according to the time range. Data of tables in the same vnode in a certain time range are saved in the same filegroup, such as files v0f1804*. This sharding strategy can effectively improve data searching speed. By default, one group of files contain data in 10 days, which can be configured by *daysPerFile* in the configuration file or by DAYS keyword in CREATE DATABASE clause.

Data in files are blockwise. A data block only contains one table’s data. Records in the same data block are sorted according to the primary timestamp.

By default, TDengine data are saved in /var/lib/taos/data/ directory. /var/lib/taos/tsdb/ directory contains vnode informations and data file links.

/var/lib/taos/
+--tsdb/
| +--vnode0
| +--meterObj.v0
| +--db/
| +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1
| +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data
| +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1
| +--v0f1805.head->/var/lib/taos/data/vnode0/v0f1805.head1
| +--v0f1805.data->/var/lib/taos/data/vnode0/v0f1805.data
| +--v0f1805.last->/var/lib/taos/data/vnode0/v0f1805.last1
| :
+--data/
+--vnode0/
+--v0f1804.head1
+--v0f1804.data
+--v0f1804.last1
+--v0f1805.head1
+--v0f1805.data
+--v0f1805.last1
:

meterObj file

There is only one meterObj file in a vnode. Information about the vnode, such as created time, configuration information, vnode statistic informations are saved in this file. It has structures like below:

<start_of_file>
 [file_header]
 [table_record1_offset&length]
 [table_record2_offset&length]
 …
 [table_recordN_offset&length]
 [table_record1]
 [table_record2]
 …
 [table_recordN]
<end_of_file>

The file header takes 512 bytes, which mainly contains information about the vnode. Each table record is the representation of a table on disk.

head file

The head files contain the index of data blocks in the data file. The inner organization is as below:

<start_of_file>
 [file_header]
 [table1_offset]
 [table2_offset]
 …
 [tableN_offset]
 [table1_index_block]
 [table2_index_block]
 …
 [tableN_index_block]
<end_of_file>

The table offset array in the head file saves the information about the offsets of each table index block. Indices on data blocks in the same table are saved continuously. This also makes it efficient to load data indices on the same table. The data index block has a structure like:

[index_block_info]
 [block1_index]
 [block2_index]
 …
 [blockN_index]

The index block info part contains the information about the index block such as the number of index blocks, etc. Each block index corresponds to a real data block in the data file or last file. Information about the location of the real data block, the primary timestamp range of the data block, etc. are all saved in the block index part. The block indices are sorted in ascending order according to the primary timestamp. So we can apply algorithms such as the binary search on the data to efficiently search blocks according to time.

data file

The data files store the real data block. They are append-only. The organization is as:

<start_of_file>
 [file_header]
 [block1]
 [block2]
 …
 [blockN]
<end_of_file>

A data block in data files only belongs to a table in the vnode and the records in a data block are sorted in ascending order according to the primary timestamp key. Data blocks are column-oriented. Data in the same column are stored contiguously, which improves reading speed and compression rate because of their similarity. A data block has the following organization:

[column1_info]
 [column2_info]
 …
 [columnN_info]
 [column1_data]
 [column2_data]
 …
 [columnN_data]

The column info part includes information about column types, column compression algorithm, column data offset and length in the data file, etc. Besides, pre-calculated results of the column data in the block are also in the column info part, which helps to improve reading speed by avoiding loading data block necessarily.

last file

To avoid storage fragmentation and to import query speed and compression rate, TDengine introduces an extra file, the last file. When the number of records in a data block is lower than a threshold, TDengine will flush the block to the last file for temporary storage. When new data comes, the data in the last file will be merged with the new data and form a larger data block and written to the data file. The organization of the last file is similar to the data file.

Summary

The innovation in architecture and storage design of TDengine significantly improves resource usage. On the one hand, the virtualization makes it easy to distribute resources between different vnodes and and make it easy to scale up in the future. On the other hand, sorted and column-oriented storage allows TDengine to have fast speeds in writing, querying and compression.