Performance: TDengine vs InfluxDB

Abstract: In this test, TDengine is compared with InfluxDB in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. The result demonstrates that TDengine outperforms InfluxDB with 5x greater write throughput, 35x larger query throughput, 140x faster in aggregation query (250x when grouping by tags and 12x when grouping by time) while using 2.1x less disk space.

About the hardware

Servers and the testing program are running on the same Dell desktop of the model type ” OptiPlex- 3050″, with  4 cores and 8G memory. Detailed configurations are as follows:

OS: Ubuntu 16.04 x64
CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
Memory: 8GB
Disk: 1TB HDD

About the dataset

1.data description

Two popular datasets were investigated before this test:

  • New York taxi running information: cannot design the data model becuase there is no information for the single car
  • faker generation tool: unfit for the IoT scenario because only strings can be generated.

To make this test repeatable, we write a specific data generation program, which simulates the temperature (int) and humidity (float) meausrements from thermohydrometers. Each thermohydrometer has three tags: device ID, device group, and device name. To make the simulation look authentic, the dataset distributes normally, not randomly.

All measurements are sampled every 1 second, containing 10k devices with 10k records from each device. There are 3 tags, 2 numeric values and 1 timestamp in each record.

2.data generatation code

The data generation code is written in Java, which can be downloaded here and executed as follows:

cd tests/comparisonTest/dataGenerator
javac com/taosdata/generator/DataGenerator.java 

3.options for the data generation code

  • dataDir: filepath to store output data files 
  • numOfFiles: number of output data files
  • numOfDevices: number of devices
  • rowsPerDevice: number of records from each device

4.data generation

Execute the following commands and then 100 data files would be created. Each data file includes measurements from 100 devices. Totally there are 10k devices with 10k records from each device.

mkdir ~/testdata
java com/taosdata/generator/DataGenerator -dataDir ~/testdata -numOfDevices 10000 -numOfFiles 100 -rowsPerDevice 10000

TDengine preparation

TDengine is an  open-source big data platform designed and optimized for Internet of Things ( (IoT), Connected Vehicles, and Industrial IoT. Besides the 10x faster time-series database, it provides caching, stream computing, message queuing and other functionalities to reduce the complexity and costs of development  and operations.

1.installation

  • Download tdengine-1.6.1.0.tar.gz.
  • Unzip and then run “install.sh” to install TDengine
  • Start TDengien by executing “sudo systemctl start taosd”
  • If the installation succeeds, enter “taos” in the terminal and then the following texts will be displayed:
Welcome to the TDengine shell, server version:1.6.1.0  client version:1.6.1.0
Copyright (c) 2017 by TAOS Data, Inc. All rights reserved.

taos> 

2.data model

For TDengine, a super table would be created for all devices in the same type and then one table  for one device. Thus, for the super table, the data records include measurement  time, temperature and humidity; the static device atributes in tags include device ID, device group, and device name.

SQL syntax for creating a super table

create table devices(ts timestamp, temperature int, humidity float) tags(devid int, devname binary(16), devgroup int);

SQL syntax for dynamically creating one table using the super table as template and insert one record

insert into dev1 using devices tags(1,'d1',0) values(1545038786000,1,3.560000);

3.testing code

The TDengine C driver is used to insert and query reocrds. In the future the testing code based on JDBCdriver will also be provided.

Currently the testing code can be downloaded here. Enter the following commands in the terminal, an executable file “./tdengineTest” will be created:

cd tdengine
make

4.how to use the testing code

Writing Options

  • writeClients: number of client connections to insert data concurrently, default 1
  • rowsPerRequest: number of records in one request ranging 1-1000, default 100
  • dataDir: data file path, same with the dataDir in the data generation code
  • numOfFiles: number of files read from dataDir

For example

./tdengineTest -dataDir ./data -numOfFiles 10 -writeClients 2 -rowsPerRequest 100

Query Options 

  • sql: path of the files which store all SQL statements to executed.

For example

./tdengineTest -sql ./sqlCmd.txt

InfluxDB preparation

InfluxDB is an open source time series database, purpose-built for monitoring metrics and events, providing real-time visibility into stacks, sensors, and systems.

1.installation

wget https://dl.influxdata.com/influxdb/releases/influxdb_1.7.7_amd64.deb
sudo dpkg -i influxdb_1.7.7_amd64.deb
  • Start InfluxDB
sudo systemctl start influxdb 
  • If the installation succeeds, enter “influx” in the terminal and then the following texts will displayed:
Connected to http://localhost:8086 version 1.7.7
InfluxDB shell version: 1.7.7
> 

2.data model

Build a measurement named “devices”, to which all devices belong. Different devices are identified by tags. Each device has three tags: device ID, device group and device name. Each record includes three columns: timestamp (ms), (int) and humidity (float).

3.testing code

The InfluxDB GO driver is used to insert and query reocrds. The testing code can be downloaded here
First install GO

sudo apt install golang
mkdir -p $HOME/local/Golang
export GOPATH=$HOME/local/Golang
go get github.com/influxdata/influxdb1-client/v2

Then execute the following command and then the execuatable file “./influxdbTest” will be created

cd tests/comparisonTest/influxdb
go build -o influxdbTest

4.how to use testing code

Writing Options

  • writeClients: number of client connections to insert data concurrently, default 1
  • rowsPerRequest: number of records in one request ranging 1-1000, default 100
  • dataDir: data file path, same with the dataDir in the data generation code
  • numOfFiles: number of files read from dataDir

For example

./influxdbTest -dataDir ~/testdata -numOfFiles 1 -writeClients 2 -rowsPerRequest 100

Querying Options

  • sql: path of the files which store all SQL statements to be executed.

For example

./influxdbTest -sql ./sqlCmd.txt

Write performance

One writing request can send one record or multiple records, denoting as “R/R” or “Records/Request”. The writing speed would increase with the “R/R”. Meanwhile, one database server can connect to many clients. The more connection, the larger write throughput. Thus, both one-connection case and multi-connection case would be tested.

1.TDengine

Writing tests are taken in multiple scanarios, including 1R/R, 100R/R, 500R/R, 1000R/R and 2000R/R with different number of connections. You can change the options in the example and take different tests.

1.clean up the existing dataset
Drop the existing database in the TDengine shell client "taos"
Welcome to the TDengine shell, server version:1.6.1.0  client version:1.6.1.0
Copyright (c) 2017 by TAOS Data, Inc. All rights reserved.

taos>drop database db;

2.start testing
The example command to read 100 data files in the ~/testdata and insert 1000 records per request by 5 clients:
./tdengineTest -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 1000

Write throughput is as follows,unit in records/second

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
126824436995513762869645296864772277
100415800734484895522976085108790211710741192199
50047984688261210830321195100126919613642561417004
100050075191449411219141239157136798914181041476560
2000512820105552011741641306904142663514584341477208
Figure 1 TDengine write throughput

2.InfluxDB

Writing tests are taken in multiple scenarios, including 1R/R, 100R/R, 1000R/R, 10000R/R, 20000R/R and 50000R/R with different numbers of connections.  You can change the options in the example and take different tests.

1.clean up the existing dataset
Drop the existing database in the InfluxDB shell client "influx"
Connected to http://localhost:8086 version 1.7.7
InfluxDB shell version: 1.7.7
> drop database db;

2.start testing
The example command to read 100 data files in the "~/testdata" and insert 10000 records per request by 5 clients:
./influxdbTest -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 10000

Write throughput is as follows,unit in records/second

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
1314355678092106
10030244325570968198013920410173
100021940306594082550622605677031177174
1000088686155154209377234124245141257454261542
2000096277179492234413255805263160268466271249
50000125187200552243861264780271101270364273820
100000130108197202240059254973265922272275270859
Figure 2 InfluxDB write throughput

3.Best Write Performance: TDengine vs InfluxDB

Compare the best writting performance of TDengine and InfluxDB. Results are as follows:

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
TDengine512820105552011741641306904142663514584341477208
InfluxDB130108200552243861264780271101272275273820
Figure 3 TDengine vs InfluxDB largest write throughput

Figure 3 demonstrates that the writing speed of TDengine is in the order of 1million records per second while that of InfluxDB is in the order of 100k records per second. In conclusion, it writes about 5 times faster in the TDengine than in the Influx DB.

It’s worthy to note that InfluxDB writes too slow when sent one row of record in one quest. Thus Kafka and other softwares about the message queue is necessary for batch insertion. However, this will increase the cost and complexity in the system development and maintenance.

Read Performance

For reading performance, this test takes a simple tranversing query, that is, reading all the data haveing been written into the database. Because of the size limit of parsing JSON results in the InfluxDB GO client, this test only ingests 1million records in every query. The testing dataset has been already divided into 100 groups according to devgroup in the preparation and the following test just randomly selects 10 groups in every query.

1.how to use the TDengine testing code

SQL expresssions are in the "tdengine/q1.txt", for example,
select * from db.devices where devgroup=0;

Execute the following command:
./tdengineTest -sql ./q1.txt

2.how to use the InfluxDB testing code

SQL expresssions are in the "influxdb/q1.txt", for example,
select * from devices where devgroup='0';

Execute the following command:
./influxDBTest -sql ./q1.txt

Reading speed is as follows,unit in second

group0group10group20group30group40group50group60group70group80group90
TDengine0.2350.2120.2080.2180.2090.2100.2090.2090.2160.208
InfluxDB7.567.217.647.287.647.527.527.527.327.42
Figure 4 Read Performance TDengine vs InfluxDB

Figure 4 demonstrates that the stable reading speed of TDengine is about 0.21s, that is, reading 5million records per second. Meanwhile, the stable reading speed of InfluxDB is about 7.5s, that is, reading 130k records per second. In conclusion, the query throughput size in TDengine is 35 times larger than that in InfluxDB.

Aggregation Performance

This part tests six aggregation function: COUNT, AVERAGE, SUM, MAX, MIN and SPREAD which are shared by TDengine and InfluxDB. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

1.TDengine

SQL expresssions are in the “tdengine/q2.txt”, for example,

select count(*) from db.devices where devgroup<10;

Execute the following command:

./tdengineTest -sql ./q2.txt

Query response time is as follows,unit in second

10%20%30%40%50%60%70%80%90%100%
count0.0180.0260.0160.0180.0170.0240.0240.0270.0300.033
avg0.0070.0140.0150.0200.0240.0380.0440.0500.0570.060
sum0.0060.0100.0190.0180.0310.0360.0340.0370.0430.046
max0.0070.0130.0150.0200.0250.0300.0350.0390.0450.049
min0.0060.0100.0160.0240.0320.0390.0450.0410.0430.049
spread0.0070.0100.0150.0190.0330.0380.0460.0520.0590.066
Figure 5 TDengine Query response

2.InfluxDB

SQL expresssions are in the “influxdb/q2.txt”. Because in the InfluxDB the tags are srings, the filters in the SQL expresssions are regular expressions.  For example, to select the data in the group 10-19:

select count(*) from devices where devgroup=~/[1-1][0-9]/;

Execute the following command:

./influxdbTest -sql ./q2.txt

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
count1.062.143.284.155.266.197.018.099.069.92
mean0.992.052.773.684.515.356.146.957.708.44
sum1.022.042.893.754.645.506.387.187.948.72
max1.011.992.853.774.695.526.357.177.958.80
min1.032.022.953.814.645.486.337.188.018.72
spread7.3816.9227.4438.2549.8660.6871.6182.5094.68105.26
Figure 6 InfluxDB Aggretation Response

3.Comparison

Compare the query response time between TDengine and InfluxDB based on the 1 million records.

countaveragesummaxminspread
TDengine0.0330.060.0460.0490.0490.066
InfluxDB9.928.448.728.88.72105.26
Figure 7 Aggregation Response TDengine vs InfluxDB 

Figure 7 demonstrates that the response time of aggregation query in TDengine is within 100ms while the response time in InfluxDB is about 10 seconds. In conclusion, the response time of aggregation query in the TDengine is more than 100 times shorter than that in the InfluxDB.

Performance of Aggregation grouped by tag

This part tests the aggregation performance grouped by tags. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

1.how to use the TDengine testing code

SQL expresssions are in the “tdengine/q3.txt”, for example,

select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 group by devgroup;

Execute the following command:

./tdengineTest -sql ./q3.txt

2.how to use the InfluxDB testing code

SQL expresssions are in the “influxdb/q3.txt”, for example,

select count(temperature), sum(temperature), mean(temperature) from devices where devgroup=~/[1-1][0-9]/ group by devgroup;

Execute the following command:

./influxdbTest -sql ./q3.txt

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
TDengine0.0300.0280.0310.0410.0690.0660.0770.0910.1020.123
InfluxDB3.196.379.6012.9515.9319.1622.0525.2028.0631.52
Figure 8 Query Response grouped by tags TDengine vs InfluxDB

Testing results show that the response time of aggretaion query grouped by tags is 300 times shorter in the TDengine than that in the InfluxDB.

Performance of Aggregation  grouped by time

This part tests the aggregation performane grouped by time. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

1.how to use the TDengine testing code

SQL expresssions are in the “tdengine/q4.txt”, for example,

select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 interval(1m);

Execute the following command:

./tdengineTest -sql ./q4.txt

2.how to use the InfluxDB testing code

SQL expresssions are in the “influxdb/q4.txt”, for example,

select count(temperature), sum(temperature), mean(temperature) from devices where devgroup=~/[1-1][0-9]/ group by time(1m);

Execute the following command:

./influxdbTest -sql ./q4.txt

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
TDengine0.2370.4720.6530.9021.1341.4221.7531.7842.0852.549
InfluxDB3.266.509.5912.8516.0719.0222.3225.4428.2931.44
Figure 9 Query Response grouped by time TDengine vs InfluxDB

Testing results show that the response time of aggretaion query grouped by time is 12 times shorter in the TDengine than that in the InfluxDB.

On-disk Compression

1.Original dataset size

In this test 100 data files are generated and stored in the folder “~/testdata”, whose size can be checked by command “du”. 

cd ~/testdata
du -m .

Results are shown in Fig. 10.

Figure 10 Original dataset size

2.Disk space using by TDengine

In TDengine all data are saved in the directory “/var/lib/taos/data” by default. Stop TDengine server before checking the data size.

sudo systemctl stop taosd

Then check the size of data in the folder “/var/lib/taos/data/” by command “du”.

cd /var/lib/taos/data
du -h .
Figure 11 Disk space usage in the TDengine

3.Disk space using by InfluxDB

In InfluxDB all data are saved in the directory “/var/lib/influxdb/data/db” by default. Stop InfluxDB server before checking the data size.

sudo systemctl stop influxDb

In this test, data is stored in the folder “/var/lib/taos/data/autogen/84”, whose size can be checked by command “du”.

cd /var/lib/influxdb/data/db/autogen/84
du -h .
Figure 12 Disk space usage in the InfluxDB 

4.Comparison of disk usage

Original test dataset occupies 3941MB in the disk, data in the InfluxDB 855MB anddata in the TDengine 459MB. The on-disk compression ratio in the TDengine is 1.86 times of that in the InfluxDB.

In the real scenarios of IoT, the on-disk compression ratio of the TDengine is expected to be larger because of the limited spread of real measurements and columns-based storage of TDengine.

Feature Comparison

Because both TDengine and InfluxDB are designed to process time-series data, they have some similar features. 

Feature supportTDengineInfluxDB
SQL syntaxYesYes
private deploymentYesYes
scalabilityYesYes
system connection managementYesYes
query task managementYesYes
data importYesYes
data exportYesYes
Web managementYesYes
multi-layer storageYesYes
Telegraf data collectionYesYes
Grafana data visualizationYesYes
RESTFulYesYes
C/C++YesNo
JDBC/ODBCYesNo
GOYesYes
PythonYesYes
database configurationYesYes
replica configuationYesYes
data alive timeYesYes
data partitionYesYes
streamingYesNo
subscriberYesNo
us precisionYesYes
aggregationYesYes
downsamplingYesYes
limit/offsetYesYes
interpolationYesYes
data updatedNoYes
tag updatedYesNo
time deleteYesYes
data cleanupYesYes

Conclusion

In this test, TDengine is compared with InfluxDB in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. Test dataset, codes and SQL expressions can be downloaded here. Thus, anyone can repeat this test again.

This test demonstates that TDengine outperforms InfluxDB with 5x greater write throughput, 35x larger data ingestion, 140x faster in aggrgation query (250x when grouping by tags and 12x when grouping by time) while using 2.1x less disk space.

TDengineInfluxDB
write throughput1477208 rows/second273820 rows/second
time to ingest 1million rows0.21s7.5s
response time to average 1E8 rows0.06s8.44s
response time to average 1E8 rows according to tags0.123s31.52s
response time to average 1E8 rows according to timestamp2.549s31.44s
disk usage by 1E8 rows of records459MB855MB