Performance: TDengine vs OpenTSDB

Avatar
TDengine Team
/

Abstract:In this test, TDengine is compared with OpenTSDB in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. The results demonstrates that TDengine outperforms OpenTSDB with 25x greater write throughput, 32x larger query throughput, 1000x faster in aggregation query (1000x when grouping by tags and 40x when grouping by time) while using 5x less disk space.

About the hardware

Servers and the testing program are running on the same Dell desktop of the model type ” OptiPlex- 3050″, with  4 cores and 8G memory. Detailed configurations are as follows:

  • OS: Ubuntu 16.04 x64
  • CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
  • Memory: 8GB
  • Disk: 1TB HDD

About the dataset

Two popular datasets were investigated before this test:

  • New York taxi running information: cannot design the data model becuase there is no information for the single car
  • faker generation tool: unfit for the IoT scenario because only strings can be generated.

To make this test repeatable, we write a specific data generation program, which simulates the temperature (int) and humidity (float) meausrements from thermohydrometers. Each thermohydrometer has three tags: device ID, device group, and device name. To make the simulation look authentic, the dataset distributes normally, not randomly.

All measurements are sampled every 1 second, containing 10k devices with 10k records from each device. There are 3 tags, 2 numeric values and 1 timestamp in each record.

2.data generatation code

The data generation code is written in Java, which can be downloaded here and executed as follows:

cd tests/comparisonTest/dataGenerator
javac com/taosdata/generator/DataGenerator.java 

3.options for the data generation code

  • dataDir : filepath to store output data files
  • numOfFiles : number of output data files
  • numOfDevices : number of devices
  • rowsPerDevice : number of records from each device

4.data generation

Execute the following commands and then 100 data files would be created. Each data file includes measurements from 100 devices. Totally there are 10k devices with 10k records from each device.

mkdir ~/testdata
java com/taosdata/generator/DataGenerator -dataDir ~/testdata -numOfDevices 10000 -numOfFiles 100 -rowsPerDevice 10000

TDengine preparation

TDengine is an  open-source big data platform designed and optimized for Internet of Things ( (IoT), Connected Vehicles, and Industrial IoT. Besides the 10x faster time-series database (TSDB), it provides caching, stream computing, message queuing and other functionalities to reduce the complexity and costs of development  and operations.

1.installation

  • Download tdengine-1.6.1.0.tar.gz.
  • Unzip and then run “install.sh” to install TDengine
  • Start TDengien by executing “sudo systemctl start taosd”
  • If the installation succeeds, enter “taos” in the terminal and then the following texts will be displayed:
Welcome to the TDengine shell, server version:1.6.1.0  client version:1.6.1.0
Copyright (c) 2017 by TAOS Data, Inc. All rights reserved.

taos> 

2.data model

For TDengine, a supertable would be created for all devices in the same type and then one table for one device. Thus, for the supertable, the data records include measurement time, temperature and humidity; the static device atributes in tags include device ID, device group, and device name.


SQL syntax for creating a supertable

create table devices(ts timestamp, temperature int, humidity float) tags(devid int, devname binary(16), devgroup int);

SQL syntax for dynamically creating one table using the supertable as template and insert one record

insert into dev1 using devices tags(1,'d1',0) values(1545038786000,1,3.560000);

3.testing code

The TDengine C driver is used to insert and query reocrds. In the future the testing code based on JDBCdriver will also be provided. Currently the testing code can be downloaded here. Enter the following commands in the terminal, an executable file “./tdengineTest” will be created:

cd tdengine
make

4.how to use the testing code

Writing Options

  • writeClients : number of client connections to insert data concurrently, default 1
  • rowsPerRequest: number of records in one request ranging 1-1000, default 100
  • dataDir : data file path, same with the dataDir in the data generation code
  • numOfFiles : number of files read from dataDir

For example

./tdengineTest -dataDir ./data -numOfFiles 10 -writeClients 2 -rowsPerRequest 100

Query Options 

  • sql: path of the files which store all SQL statements to executed.

For example

./tdengineTest -sql ./sqlCmd.txt

OpenTSDB preparation

OpenTSDB is a scalable time series database built on top of Hadoop and HBase. It simplifies the process of storing and analyzing large amounts of time-series data generated by endpoints like sensors or servers.

1.installation

  • Install HBase

Download “hbase-1.4.10-bin.tar.gz” from http://archive.apache.org/dist/hbase/1.4.10/

tar xzvf hbase-1.4.10-bin.tar.gz

cd hbase-1.4.10/bin

./start_hbase.sh
  • Download and install OpenTSDB
git clone git://github.com/OpenTSDB/opentsdb.git
cd opentsdb
./build.sh
  • Create tables in HBase

If this is the first time to run OpenTSDB, some tables need to be created in HBase beforehead.

env COMPRESSION=NONE HBASE_HOME=${HBASE_HOME}/hbase-version ${OpenTSDB_download_path}/src/create_table.sh
  • Start OpenTSDB
sudo service opentsdb start

Open interative web page from http://hostIp:424

TDengine time series database | 22.07 01
  • Modify configuartion file
cd /etc/opentsdb
vim opentsdb.conf
tsd.core.auto_create_metrics = true 
tsd.http.request.enable_chunked = true 
tsd.http.request.max_chunk = 30000

If errors like “java.net.NoRouteToHostException: Cannot assign requested address (Address not available)” occur in this test, you can execute the following command:

sudo sysctl -w net.ipv4.tcp_tw_reuse=1

2.data model

For each row of records, create two points separately for temperature and humidity. Each point include a metric name, metric value and its timestamp. Each point is  attached with three tags related with the measurement device: device ID, device group and device name. 

3.testing code

This test uses OpenTSDB http client and the testing code can be downloaded here.

4.how to use testing code

Writing Options

  • writeClients : number of client connections to insert data concurrently, default 1
  • rowsPerRequest : number of records in one request ranging, default 100
  • dataDir : data file path, same with the dataDir in the data generation code
  • numOfFiles : number of files read from dataDir

For example

cd opentsdb/opentsdbtest/src/target
java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -dataDir ~/testdata -numOfFiles 100 -writeClients 2 -rowsPerRequest 30

Querying Options

  • sql: path of the files which store all SQL statements to be executed. 
java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql sqlchoice

Write performance

One writing request can send one record or multiple records, denoting as “R/R” or “Records/Request”. The writing speed would increase with the “R/R”. Meanwhile, one database server can connect to many clients. The more connection, the larger write throughput. Thus, both one-connection case and multi-connection case would be tested.

1.TDengine

Writing tests are taken in multiple scanarios, including 1R/R, 100R/R, 500R/R, 1000 R/R and 2000R/R with different number of connections. You can change the options in the example and take different tests.

  1. clean up the existing dataset
    drop database db;
  2. start testing
    The example command to read 100 data files in the ~/testdata and insert 1000 records per request by 5 clients:
    ./tdengineTest -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 1000

Write throughput is as follows,unit in records/second

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
126824436995513762869645296864772277
100415800734484895522976085108790211710741192199
50047984688261210830321195100126919613642561417004
100050075191449411219141239157136798914181041476560
2000512820105552011741641306904142663514584341477208
TDengine time series database | 22.07 02

2.OpenTSDB

Writing tests are taken in multiple scenarios, including 1R/R, 10R/R, 30R/R, 50R/R and 80R/R with different number of connections. You can change the options in the following example and take different tests.

  1. clean up the existing dataset
    1. Execute “./hbase shell” to launch HBase shell client.
      disable 'tsdb'; disable 'tsdb-meta'; disable 'tsdb-tree'; disable 'tsdb-uid';
      drop 'tsdb'; drop 'tsdb-meta'; drop 'tsdb-tree'; drop 'tsdb-uid';
      quit
    2. then create tables for OpenTSDB env
      COMPRESSION=NONE HBASE_HOME=${HBASE_HOME}/hbase-version ${OpenTSDB_download_path}/src/create_table.sh
  2. start testing
    The example command to read 100 data files in the “~/testdata” and insert 30 records per request by 5 clients:
    java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 30

Write throughput is as follows,unit in records/second

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
12370247425722710249724362371
1018393228282265622924225072320023099
3037463456494573546342467954667544908
5045255532225050354475545435428354970
8048794563865456456999571985731857272
TDengine time series database | 22.07 03

3.Best Write Performance TDengine vs OpenTSDB

Compare the best writting performance of TDengine and OpenTSDB. Results are as follows:

R/R1 client2 clients3 clients4 clients5 clients6 clients7 clients
TDengine512820105552011741641306904142663514584341477208
OpenTSDB48794563865456456999571985731857272
TDengine time series database | 22.07 04

Figure 3 demonstrates that the writing speed of TDengine is in the order of 1million records per second while that of OpenTSDB is in the order of 60k records per second. In conclusion, it writes about 25 times faster in the TDengine than in the OpenTSDB.

Read Performance

For reading performance, this test takes a simple tranversing query, that is, reading all the data having been written into the database. this test only ingests 1million records in every query. The testing dataset has been already divided into 100 groups according to devgroup in the preparation and the following test just randomly selects 10 groups in every query.

  1. how to use the TDengine testing code
    SQL expresssions are in the “tdengine/q1.txt”, for example, select * from db.devices where devgroup=0;
    Execute the following command:
    /./tdengineTest -sql ./q1.txt
  2. how to use the OpenTSDB testing code
    The OpenTSDB query statments are in the json format and already written in the testing code. Change the option “sqlchoice” can choose the query operation.
    1. set sqlchoice to be q1
    2. Execute the following command:
      java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql q1

Reading speed is as follows,unit in second

LatencyG-0G-10G-20G-30G-40G-50G-60G-70G-80G-90
TDengine0.2350.2120.2180.2090.2100.2090.2090.2090.2160.208
OpenTSDB6.695.926.586.657.297.215.986.207.036.57
TDengine time series database | 22.07 05

Figure 4 demonstrates that the stable reading speed of TDengine is about 0.21s, that is, reading 5 million records per second. Meanwhile, the stable reading speed of OpenTSDB is about 6.7s, that is, reading 150k records per second. In conclusion, the data query throughput size in TDengine is 32 times larger than that in OpenTSDB.

Aggregation Performance

This part tests five aggregation function: COUNT, AVERAGE, SUM, MAX and MIN which are shared by TDengine and OpenTSDB. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

1.TDengine

SQL expresssions are in the “tdengine/q2.txt”, for example,

select count(*) from db.devices where devgroup<10;

Execute the following command:

./tdengineTest -sql ./q2.txt

Query response time is as follows,unit in second

10%20%30%40%50%60%70%80%90%100%
COUNT0.0180.0260.0160.0180.0170.0240.0240.0270.0300.033
AVG0.0070.0140.0150.0200.0240.0380.0440.0500.0570.060
SUM0.0060.0100.0190.0180.0310.0360.0340.0370.0430.046
MAX0.0070.0130.0150.0200.0250.0300.0350.0390.0450.049
MIN0.0060.0100.0160.0240.0320.0390.0450.0410.0430.049
SPREAD0.0070.0100.0150.0190.0330.0380.0460.0520.0590.066
TDengine time series database | 22.07 06

2.OpenTSDB

The OpenTSDB query statments are in the json format and already written in the testing code. Change the option “sqlchoice” can choose the query operation.

  1. set sqlchoice to be q2
  2. Execute the following command:
    java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql q2

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
COUNT67.8267.366.8767.1766.6767.2367.1766.8867.166.72
MEAN66.6267.367.2167.167.0766.7667.3167.0066.5266.99
SUM67.1266.7967.6866.9067.4166.5966.9567.166.7466.59
MAX66.5567.1366.9367.1266.9667.1566.9166.7367.167.29
MIN66.8267.0366.6666.566.8266.6467.3667.0466.5166.67
TDengine time series database | 22.07 07

3.Comparison

Compare the query response time between TDengine and OpenTSDB based on the 1E8 records.

CountAverageSumMaxMin
TDengine0.0330.060.0460.0490.049
OpenTSDB66.7266.9966.5967.2966.67
TDengine time series database | 22.07 08

Figure 7 demonstrates that the response time of aggregation query in TDengine is within 100ms while the response time in OpenTSDB is about 66 seconds. In conclusion, the response time of aggregation query in the TDengine is more than 1000 times shorter than that in the OpenTSDB. 

Performance of Aggregation  grouped by tags

This part tests the aggregation performance grouped by tags. Each aggregation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

  1. how to use the TDengine testing code
    SQL expresssions are in the “tdengine/q3.txt”, for example, select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 group by devgroup;
    Execute the following command:
    ./tdengineTest -sql ./q3.txt
  2. how to use the OpenTSDB testing code
    1. set sqlchoice to be q3
    2. Execute the following command:
      java -jar opentsdbtest-1.0.SNAPSHOT-jar-with-dependencies.jar -sql q3

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
TDengine0.0300.0280.0310.0410.0690.0660.0770.0910.1020.123
OpenTSDB125.91127.39126.79126.42125.73126.85127.77126.99127.16126.41
TDengine time series database | 22.07 09

Testing results show that the response time of aggretaion query grouped by tags is 1000 times shorter in the TDengine than that in the OpenTSDB.

Performance of aggregation grouped by time

This part tests the aggregation performane grouped by time. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.

  1. how to use the TDengine testing code
    SQL expresssions are in the “tdengine/q4.txt”, for example, select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroups<10 interval(1m);
    Execute the following command:
    ./tdengineTest -sql ./q4.txt
  2. how to use the OpenTSDB testing code
    1. set sqlchoice to be q4
    2. Execute the following command:
      java -jar opentsdbtest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql q4

Query response time is as follows,unit in second,

10%20%30%40%50%60%70%80%90%100%
TDengine0.2370.4720.6530.9021.1341.4221.7531.7842.0852.549
OpenTSDB82.5383.0483.9382.7482.9682.7582.1482.3783.2982.46
TDengine time series database | 22.07 10

Testing results show that the response time of aggretaion query grouped by time is 40 times shorter in the TDengine than that in the OpenTSDB.

On-disk Compression

1.Original dataset size

In this test 100 data files are generated and stored in the folder “~/testdata”, whose size can be checked by command “du”. 

cd ~/testdata
du -h .

Results are shown in Fig. 10.

TDengine time series database | 22.07 11

2.Disk space using by TDengine

In TDengine all data are saved in the directory “/var/lib/taos/data” by default. Stop TDengine server before checking the data size.

sudo systemctl stop taosd

Then check the size of data in the folder “/var/lib/taos/data/” by command “du”.

cd /var/lib/taos/data
du -h .

Results are shown in Fig. 11.

TDengine time series database | 22.07 12

3.Disk space using by OpenTSDB

In OpenTSDB all data are saved in the directory “/var/lib/hbase/data” by default. Stop OpenTSDB server before checking the data size.

sudo service opentsdb stop

Check the data folder size by command “du”.

cd /var/lib/hbase/data/
du -sh 

Results are shown in Fig. 12.

TDengine time series database | 22.07 13

4.Comparison of disk usage

Original test dataset occupies 3941MB in the disk, data in the OpenTSDB 2.3GB and data in the TDengine 459MB. The on-disk compressino ratio in the TDengine is 5 times of that in the OpenTSDB.


In the real scenarios of IoT, the on-disk compression ratio of the TDengine is expected to be larger because of the limited spread of real measurements and the column-based storage of TDengine. 

Feature Comparison

Both TDengine and OpenTSDB can be used to process time-series data and they have some similar features.

FeatureTDengineOpenTSDB
SQL syntax
Private deployment
Scalability
System connection management
Query task management
Data import
Data export
Web management
Multi-layer storage
Telegraf data collection
Grafana data visualization
RESTful
C/C++
JDBC/ODBC
Go
Python
Database configuration
Replica configuration
Data alive time
Data partition
Streaming
Subscriber
Microsecond precision
Aggregation
Downsampling
Limit/offset
Interpolation
Data updated
Tag updated
Time delete
Data cleanup

Conclusion

In this test, TDengine is compared with OpenTSDB in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. Test dataset, codes and SQL expressions can be downloaded here. Thus, anyone can repeat this test again.

This test demonstates that TDengine outperforms OpenTSDB with 25x greater writting throughput, 32x larger query throughput, 1000x faster in aggrgation query (1000x when grouping by tags and 40x when grouping by time) while using 5x less disk space.

TDengineOpenTSDB
Write throughput1,477,208 rows/second57,272 rows/second
Time to ingest 1 million rows0.21s6.57s
Response time to average 100 million rows0.06s66.99s
Response time to average 100 million rows by tag0.123s126.41s
Response time to average 100 million rows by timestamp2.549s82.46s
Disk usage for 100 million rows459 MB2.3 GB