Abstract:In this test, TDengine is compared with Cassandra in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. The results demonstrates that TDengine outperforms Cassandra with 20x greater write throughput, 17x larger data ingestion, 4000x faster in aggregation query (2500x when grouping by tags and 119x when grouping by time) while using 26.7x less disk space.
About the hardware
Servers and the testing program are running on the same Dell desktop of the model type ” OptiPlex- 3050″, with 4 cores and 8G memory. Detailed configurations are as follows:
- OS: Ubuntu 16.04 x64
- CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
- Memory: 8GB
- Disk: 1TB HDD
About the dataset
1.data description
Two popular datasets were investigated before this test:
- New York taxi running information: cannot design the data model becuase there is no information for the single car
- faker generation tool: unfit for the IoT scenario because only strings can be generated.
To make this test repeatable, we write a specific data generation program, which simulates the temperature (int) and humidity (float) meausrements from thermohydrometers. Each thermohydrometer has three tags: device ID, device group, and device name. To make the simulation look authentic, the dataset distributes normally, not randomly.
All measurements are sampled every 1 second, containing 10k devices with 10k records from each device. There are 3 tags, 2 numeric values and 1 timestamp in each record.
2.data generatation code
The data generation code is written in Java, which can be downloaded here and executed as follows:
cd tests/comparisonTest/dataGeneratorjavac com/taosdata/generator/DataGenerator.java
3.options for the data generation code
- dataDir : filepath to store output data files
- numOfFiles : number of output data files
- numOfDevices : number of devices
- rowsPerDevice : number of records from each device
4.data generation
Execute the following commands and then 100 data files would be created. Each data file includes measurements from 100 devices. Totally there are 10k devices with 10k records from each device.
mkdir ~/testdatajava com/taosdata/generator/DataGenerator -dataDir ~/testdata -numOfDevices 10000 -numOfFiles 100 -rowsPerDevice 10000
TDengine preparation
TDengine is an open-source big data platform designed and optimized for Internet of Things (IoT), Connected Vehicles, and Industrial IoT. Besides the 10x faster time-series database (TSDB), it provides caching, stream computing, message queuing and other functionalities to reduce the complexity and costs of development and operation.
1.installation
- Download tdengine-1.6.1.0.tar.gz.
- Unzip and then run “install.sh” to install TDengine
- Start TDengien by executing “sudo systemctl start taosd”
- If the installation succeeds, enter “taos” in the terminal and then the following texts will be displayed:
Welcome to the TDengine shell, server version:1.6.1.0 client version:1.6.1.0Copyright (c) 2017 by TAOS Data, Inc. All rights reserved. taos>
2.data model
For TDengine, a super table would be created for all devices in the same type and then one table for one device. Thus, for the super table, the data records include measurement time, temperature and humidity; the static device atributes in tags include device ID, device group, and device name.
SQL syntax for creating a super table
create table devices(ts timestamp, temperature int, humidity float) tags(devid int, devname binary(16), devgroup int);
SQL syntax for dynamically creating one table using the super table as template and insert one record
insert into dev1 using devices tags(1,'d1',0) values(1545038786000,1,3.560000);
3.testing code
The TDengine C driver is used to insert and query reocrds. In the future the testing code based on JDBCdriver will also be provided. Currently the testing code can be downloaded here. Enter the following commands in the terminal, an executable file “./tdengineTest” will be created:
cd tdengine
make
4.how to use the testing code
Writing Options
- writeClients : number of client connections to insert data concurrently, default 1
- rowsPerRequest : number of records in one requestnumber of records in one request, default 100
- dataDir : data file path, same with the dataDir in the data generation code
- numOfFiles : number of files read from dataDir
For example
./tdengineTest -dataDir ./data -numOfFiles 10 -writeClients 2 -rowsPerRequest 100
Query Options
- sql: path of the files which store all SQL statements to executed.
For example
./tdengineTest -sql ./sqlCmd.txt
Cassandra preparation
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database.
1.installation
- Download and install Cassandra
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra
- Start Cassandra
sudo service cassandra start
- If the installation succeeds, enter “cqlsh” in the terminal and then the following texts will be displayed:
Connected to Test Cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]Use HELP for help. cqlsh>
2.data model
Build a table named “test” in the keyspace “cassandra”, to which all devices belong. Each record includes six columns: timestamp (ms), temperature (int) and humidity (float), device ID (int), device group (int) and device name (string). Primary keys are device group, device id, device name and timestamp. Because in the Cassadra “where” clause can only filter primary key, all columns to be filtered in this test need to be primary keys. Meanwhile, in the Cassandra, “group by” clause can only aggregate primary keys orderly, the column to be grouped must be put in the first place of primary keys.
3.testing code
The Cassandra Java driver is used to insert and query reocrds. The testing code can be downloaded here.
First install Cassandra Java client (https://github.com/datastax/java-driver) provide by DataStax. Related dependency has been added in the pom.xml to use this Java client.
4. how to use the testing code
Preparation
To avoid “timeout” error due to the slow write/query response, the default “timeout” value for Cassandra server and client need to be modified before this test.
- server, enlarge all options related to “timeout” by 100-1000 times in the “/etc/cassandra/cassandra.yaml”
- client, In the “application.conf” file under the folder “cassandra/”, client timeout default value has been modified. This file path needs to be provided in the test.
Writing Options
- writeClients : number of client connections to insert data concurrently, default 1
- rowsPerRequest: number of records in one request, default 100
- dataDir : data file path, same with the dataDir in the data generation code
- numOfFiles : number of files read from dataDir
- conf: path storing the client configuration file
For example
cd cassandra/cassandratest/target
java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -datadir ./data -numofFiles 100 -rowsperrequest 2000 -writeclients 4 -conf cassandra/application.conf
Querying Options
- sql: path of the files which store all SQL statements to be executed.
For example
cd cassandra/cassandratest/target
java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql cassandra/sqlCmd.txt -conf cassandra/application.conf
Write performance
One writing request can send one record or multiple records, denoting as “R/R” or “Records/Request”. The writing speed would increase with the “R/R”. Meanwhile, one database server can connect to many clients. The more connection, the larger write throughput. Thus, both one-connection case and multi-connection case would be tested.
1.TDengine
Writing tests are taken in multiple scanarios, including 1R/R, 100R/R, 500R/R, 1000 R/R and 2000R/R with different number of connections. You can change the options in the example and take different tests.
- clean up existing dataset
drop database db;
- start testing
The example command to read 100 data files in the ~/testdata and insert 1000 records per request by 5 clients:./tdengineTest -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 1000
Write throughput is as follows,unit in records/second
R/R | 1 client | 2 clients | 3 clients | 4 clients | 5 clients | 6 clients | 7 clients |
---|---|---|---|---|---|---|---|
1 | 26824 | 43699 | 55137 | 62869 | 64529 | 68647 | 72277 |
100 | 415800 | 734484 | 895522 | 976085 | 1087902 | 1171074 | 1192199 |
500 | 479846 | 882612 | 1083032 | 1195100 | 1269196 | 1364256 | 1417004 |
1000 | 500751 | 914494 | 1121914 | 1239157 | 1367989 | 1418104 | 1476560 |
2000 | 512820 | 1055520 | 1174164 | 1306904 | 1426635 | 1458434 | 1477208 |

2.Cassandra
Writing tests are taken in multiple scenarios, including 1R/R, 10R/R, 50R/R, 100R/R, 500R/R and 1000R/R with different number of connections. You can change the options in the following example and take different tests.
- clean up existing dataset
drop database Cassandra;
- start testing
The example command to read 100 data files in the ~/testdata and insert 1000 records per request by 5 clients:java -jar Cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -dataDir ~/testdata -numOfFiles 100 -writeClients 5 -rowsPerRequest 1000 -conf Cassandra/application.conf
Write throughput is as follows,unit in records/second
R/R | 1 client | 2 clients | 3 clients | 4 clients | 5 clients | 6 clients | 7 clients |
---|---|---|---|---|---|---|---|
1 | 3515 | 4925 | 5529 | 5991 | 6331 | 6380 | 6597 |
10 | 35998 | 35542 | 35124 | 34135 | 35077 | 35886 | 36102 |
50 | 31743 | 49423 | 51626 | 55752 | 57282 | 56815 | 55831 |
100 | 38328 | 50387 | 54519 | 56940 | 57853 | 59335 | 61708 |
500 | 30417 | 36264 | 38078 | 39066 | 39459 | 39758 | 39918 |
1000 | 21555 | 25293 | 26224 | 26559 | 26765 | 26511 | 26693 |

3.Best Write Performance: TDengine vs Cassandra
Compare the best writting performance of TDengine and Cassandra. Results are as follows:
R/R | 1 client | 2 clients | 3 clients | 4 clients | 5 clients | 6 clients | 7 clients |
---|---|---|---|---|---|---|---|
TDengine | 512820 | 1055520 | 1174164 | 1306904 | 1426635 | 1458434 | 1477208 |
Cassandra | 38328 | 50387 | 54519 | 56940 | 57835 | 59335 | 61708 |

Figure 3 demonstrates that the writing speed of TDengine is in the order of 1million records per second while that of Cassandra is in the order of 10~100k records per second. In conclusion, it writes about 20 times faster in the TDengine than in the Cassandra.
Read Performance
For reading performance, this test takes a simple tranversing query, that is, reading all the data having been written into the database.
- how to use the TDengine testing code
SQL expresssions are in the “tdengine/q1.txt”, for example,select * from db.devices where devgroup=0;
Execute the following command:./tdengineTest -sql ./q1.txt
- how to use the Cassandra testing code
SQL expresssions are in the “cassandra/q1.txt”, for example,select * from devices where devgroup=0;
Execute the following command:java -jar Cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -conf Cassandra/application.conf -sql Cassandra/q1.txt
Reading speed is as follows,unit in second
Latency | G-0 | G-10 | G-20 | G-30 | G-40 | G-50 | G-60 | G-70 | G-80 | G-90 |
---|---|---|---|---|---|---|---|---|---|---|
TDengine | 0.235 | 0.212 | 0.218 | 0.209 | 0.210 | 0.209 | 0.209 | 0.209 | 0.216 | 0.208 |
Cassandra | 3.92 | 3.68 | 3.65 | 3.61 | 3.69 | 3.57 | 3.55 | 3.59 | 3.66 | 3.64 |

Figure 4 demonstrates that the stable reading speed of TDengine is about 0.21s, that is, reading 5million records per second. Meanwhile, the stable reading speed of Cassandra is about 3.6s, that is, reading 300k records per second. In conclusion, the query throughput size in TDengine is 17 times larger than that in Cassandra.
Aggregation Performance
This part tests five aggregation function: COUNT, AVERAGE, SUM, MAX and MIN which are shared by TDengine and Cassandra. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.
1.TDengine
SQL expresssions are in the “tdengine/q2.txt”, for example,
select count(*) from db.devices where devgroup<10;
Execute the following command
./tdengineTest -sql ./q2.txt
Query response time is as follows,unit in second
10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
---|---|---|---|---|---|---|---|---|---|---|
COUNT | 0.018 | 0.026 | 0.016 | 0.018 | 0.017 | 0.024 | 0.024 | 0.027 | 0.030 | 0.033 |
AVG | 0.007 | 0.014 | 0.015 | 0.020 | 0.024 | 0.038 | 0.044 | 0.050 | 0.057 | 0.060 |
SUM | 0.006 | 0.010 | 0.019 | 0.018 | 0.031 | 0.036 | 0.034 | 0.037 | 0.043 | 0.046 |
MAX | 0.007 | 0.013 | 0.015 | 0.020 | 0.025 | 0.030 | 0.035 | 0.039 | 0.045 | 0.049 |
MIN | 0.006 | 0.010 | 0.016 | 0.024 | 0.032 | 0.039 | 0.045 | 0.041 | 0.043 | 0.049 |
SPREAD | 0.007 | 0.010 | 0.015 | 0.019 | 0.033 | 0.038 | 0.046 | 0.052 | 0.059 | 0.066 |

2.Cassandra
SQL expresssions are in the “cassandra/q2.txt”.
select count(*) from devices where devgroup<10;
Execute the following command:
Java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql cassandra/q2.txt -conf cassandra/application.conf
Query response time is as follows,unit in second,
10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
---|---|---|---|---|---|---|---|---|---|---|
COUNT | 33.79 | 67.23 | 87.64 | 105.82 | 131.52 | 160.88 | 188.70 | 213.85 | 240.39 | 264.49 |
MEAN | 28.88 | 57.83 | 87.16 | 114.87 | 145.30 | 173.32 | 204.11 | 235.33 | 261.29 | 290.97 |
SUM | 29.35 | 58.19 | 86.24 | 115.56 | 145.73 | 173.81 | 203.94 | 234.15 | 260.41 | 292.51 |
MAX | 28.94 | 57.85 | 85.60 | 115.02 | 145.62 | 175.08 | 202.53 | 232.61 | 260.37 | 288.46 |
MIN | 29.58 | 58.26 | 87.27 | 117.22 | 144.01 | 174.20 | 201.88 | 235.98 | 263.69 | 290.27 |

3.Comparison
Compare the query response time between TDengine and Cassandra based on the 1E8 records.
Count | Average | Sum | Max | Min | |
---|---|---|---|---|---|
TDengine | 0.033 | 0.06 | 0.046 | 0.049 | 0.049 |
Cassandra | 264.49 | 290.97 | 291.51 | 288.46 | 290.27 |

Figure 7 demonstrates that the response time of aggregation query in TDengine is within 100ms while the response time in Cassandra is about 200~300 seconds. In conclusion, the response time of aggregation query in the TDengine is more than 100 times shorter than that in the Cassandra.
Performance of Aggregation grouped by tags
This part tests the aggregation performance grouped by tags. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.
- how to use the TDengine testing code
SQL expresssions are in the “tdengine/q3.txt”, for example,select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 group by devgroup;
Execute the following command:./tdengineTest -sql ./q3.txt
- how to use the Cassandra testing code
SQL expresssions are in the “cassandra/q3.txt”, for example,select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 group by devgroup;
Execute the following command:java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql cassandra/q3.txt -conf cassandra/application.conf
Query response time is as follows,unit in second,
10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
---|---|---|---|---|---|---|---|---|---|---|
TDengine | 0.030 | 0.028 | 0.031 | 0.041 | 0.069 | 0.066 | 0.077 | 0.091 | 0.102 | 0.123 |
Cassandra | 31.40 | 62.21 | 92.12 | 122.01 | 154.95 | 185.03 | 217.46 | 249.59 | 281.86 | 308.89 |

Testing results show that the response time of aggretaion query grouped by tags is 3000 times shorter in the TDengine than that in the Cassandra.
Performance of Aggregation grouped by time
This part tests the aggregation performane grouped by time. Each aggratation query would be paired with a filter to select 1/10, 2/10, 3/10, …, or all of the 100 devices.
- how to use the TDengine testing code
SQL expresssions are in the “tdengine/q4.txt”, for example,select count(temperature), sum(temperature), avg(temperature) from db.devices where devgroup<10 interval(1m);
Execute the following command:./tdengineTest -sql ./q4.txt
- how to use the Cassandra testing code
Because the limit of “where” and “group by” clause, data needs to be inserted into the database again, in which a new column “minute” is added and put to the first place of primary keys.
Execute the following command:java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -datadir ~/testdata -numofFiles 100 -rowsperrequest 2000 -writeclients 4 -conf cassandra/application.conf -timetest
- SQL expresssions are in the “cassandra/q4.txt”, for example,
select count(temperature), sum(temperature), mean(temperature) from devices where devgroup<10 group by minute;
Execute the following command:java -jar cassandratest-1.0-SNAPSHOT-jar-with-dependencies.jar -sql cassandra/q4.txt -conf cassandra/application.conf
Query response time is as follows,unit in second,
10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
---|---|---|---|---|---|---|---|---|---|---|
TDengine | 0.237 | 0.472 | 0.653 | 0.902 | 1.134 | 1.422 | 1.753 | 1.784 | 2.085 | 2.549 |
Cassandra | 131.35 | 153.87 | 169.40 | 188.86 | 203.47 | 227.61 | 250.41 | 274.53 | 294.87 | 303.51 |

Testing results show that the response time of aggretaion query grouped by time is 100 times shorter in the TDengine than that in the Cassandra.
On-disk Compression
1.Original dataset size
In this test 100 data files are generated and stored in the folder “~/testdata”, whose size can checked by command “du”
cd ~/testdata
du -h .

2.Disk space using by TDengine
In TDengine all data are saved in the directory “/var/lib/taos/data” by default. Stop TDengine server before checking the data size.
sudo systemctl stop taosd
Then check the size of data in the folder “/var/lib/taos/data/” by command “du”.
cd /var/lib/taos/datadu -h .
Results are shown in Fig. 11.

3.Disk space using by Cassandra
In Cassandra all data are saved in the directory “/var/lib/Cassandra/data/keyspace_name” by default. Stop Cassandra server before checking the data size.
sudo service Cassandra stop
Check the data folder size by command “du”.
cd /var/lib/Cassandra/data/Cassandra
du -sh .
Results are shown in Fig. 12.

4.Comparison of disk usage
Original test dataset occupies 3941MB in the disk, data in the Cassandra 12GB and data in the TDengine 459MB. The on-disk compression ratio in the TDengine is 26.7 times of that in the Cassandra.
In the real scenarios of IoT, the on-disk compression ratio of the TDengine is expected to be larger because of the limited spread of real measurements and the column-based storage of TDengine.
Feature Comparison
Both TDengine and Cassandra can be used to process time-series data and they have some similar features.
Feature | TDengine | Cassandra |
---|---|---|
SQL syntax | ✅ | ❌ |
Private deployment | ✅ | ❌ |
Scalability | ✅ | ❌ |
System connection management | ✅ | ✅ |
Query task management | ✅ | ✅ |
Data import | ✅ | ✅ |
Data export | ✅ | ✅ |
Web management | ✅ | ✅ |
Multi-layer storage | ✅ | ✅ |
Telegraf data collection | ✅ | ✅ |
Grafana data visualization | ✅ | ✅ |
RESTful | ✅ | ✅ |
C/C++ | ✅ | ❌ |
JDBC/ODBC | ✅ | ❌ |
Go | ✅ | ✅ |
Python | ✅ | ✅ |
Database configuration | ✅ | ✅ |
Replica configuration | ✅ | ✅ |
Data alive time | ✅ | ✅ |
Data partition | ✅ | ✅ |
Streaming | ✅ | ❌ |
Subscriber | ✅ | ❌ |
Microsecond precision | ✅ | ✅ |
Aggregation | ✅ | ✅ |
Downsampling | ✅ | ✅ |
Limit/offset | ✅ | ✅ |
Interpolation | ✅ | ✅ |
Data updated | ❌ | ✅ |
Tag updated | ✅ | ❌ |
Time delete | ✅ | ✅ |
Data cleanup | ✅ | ✅ |
Conclusion
In this test, TDengine is compared with Cassandra in the terms of writing throughput, query throughput, aggregation query response time and on-disk compression. Test dataset, codes and SQL expressions can be downloaded here. Thus, anyone can repeat this test again.
This test demonstates that TDengine outperforms Cassandra with 20x greater write throughput, 17x larger query throughput, 4000x faster in aggrgation query (2500x when grouping by tags and 119x when grouping by time) while using 26.7x less disk space.
TDengine | Cassandra | |
---|---|---|
Write throughput | 1,477,208 rows/second | 61,708 rows/second |
Time to ingest 1 million rows | 0.21s | 3.64s |
Response time to average 100 million rows | 0.06s | 264.49s |
Response time to average 100 million rows by tag | 0.123s | 308.39s |
Response time to average 100 million rows by timestamp | 2.549s | 303.51s |
Disk usage for 100 million rows | 459 MB | 12 GB |