Migrating from InfluxDB for native open-source clustering and improved performance

As business needs evolve, IT infrastructure must evolve to meet the needs of growing businesses. Sometimes this can involve replacing foundational elements of data infrastructure – the database engine. This is a non-trivial and complex project with several considerations that take good research and testing and can take significant time and effort. In this article, the author describes his experience with TDengine as a replacement for their InfluxDB time-series database engine and the process of due diligence that they went through in order to select, develop and deploy TDengine.

Starting out with TDengine

I had only been doing Java development for about three years and had never had to use time-series databases. I had heard of time-series databases but had no practical or theoretical experience with them.

In May 2021, I changed industries and in my first project with my new company I had the opportunity to work with the popular time series database, InfluxDB. I didn’t get to work with TDengine until the next project in which I participated in the entire process. While the first and second projects were similar, the second project did involve replacing the time series database.

The first task assigned to me in the second project was to investigate TDengine and see whether it would meet our business and end-user requirements. At that time, I was still quite apprehensive, since the choice of time series database would affect the course of the project, the outcome of the project, and of course had long-term ramifications for the business. In order to learn more about TDengine, I initiated communications with the staff at TDengine and my story with TDengine kicked off.

Diving deeper into TDengine

The process that I follow when learning about a new technology or product, is to learn by reading the official documentation on the website, and trying to use the product.

Unfortunately my initial experience with TDengine was mixed because I ran into an installation issue. The TDengine daemon, taosd, would have to be restarted frequently. But what was impressive was that this issue was already known to the staff at TDengine and they had a workaround, even if it was not optimal.

Essentially for some reason, after downloading and installing TDengine using the installation package, and then entering the “taosd” command to start the TDengine daemon, it would need to be restarted. Following this, the error message “frequent startup cannot be started” kept appearing. I was determined to install and try TDengine and so I ended up having to reinstall TDengine.

Once I installed TDengine correctly, I ran into the second issue. I wanted to use visual tools to perform operations against the TDengine database. Since I have used IntelliJ IDEA as a development IDE before and since the TDengine blog has a technical article about using IDEA, I decided to give it a try. However I ran into some issues and was not able to use the tool. Of course, not being able to use a visual tool doesn’t actually affect the ability to use TDengine since there is really good CLI called “taos shell” provided by TDengine, in which one can run ad hoc SQL against the database.

In order to resolve the IDEA issue, I first went to the GitHub repository of TDengine at https://github.com/taosdata/TDengine to find the answer. I was unable to find any resolution but I submitted the problem to the R&D personnel of TDengine by email. After a few emails, they were able to help me resolve my issue and connect IDEA with TDengine.

The engineers at TDengine also introduced me to Grafana, as a visualization and monitoring tool, which we are now using frequently. Another important thing to note is that after I upgraded TDengine to 2.1.0.0, I was able to perform operations through IDEA without any issues at all, which shows that TDengine really iterate their product often and fix bugs and make life easier for users.

It is also worth mentioning that the SQL syntax of TDengine is very good and is similar to the MySQL I often use. The learning cost is relatively small, and it is easy to get started. In some cases we do have to consult the official website for the SQL syntax or ask technical support but on the whole having the ability to use SQL is extremely beneficial. In addition to TDengine’s own technical support, there is plenty of support from the open source community of TDengine.

One of the novel concepts that I did have to study is the idea of super tables and how they related to ordinary or “sub tables”. Once I realized the advantages of super tables, we have used them to great advantage in our industrial use cases.

Another issue we had to solve was how to use our existing Java code with TDengine. The technical staff at TDengine recommended that we use Spring Boot with MyBatis along with the TDengine JDBC driver. We were able to successfully implement a demonstration application using this suggestion. The JDBC connection information can be found here for your reference.

Finally, of course there were performance considerations. We were able to use taos tools like taosbenchmark to insert a lot of data and then query this data. This was really able to showcase the incredible ingestion speed as well as the incredibly low query latency of TDengine. We also found that storing such vast amounts of data actually doesn’t take up very much space thanks to TDengine’s high compression ratios.

Choosing TDengine

After comprehensive research and testing and comparisons with the features and performance of InfluxDB we decided to choose TDEngine. I explain some of the reasoning below.

First of all, the novel concept of super tables vs ordinary/sub tables is extremely optimal for our use cases. Previously, our project consisted of one site and one unit corresponding to multiple data collection points. With the growth in our business, there are multiple data collection points at multiple sites and multiple units. Using the abstraction of the super table, we can create a super table corresponding to a certain unit of a certain site. From this abstraction we can create sub-tables that correspond to the data collection points for a particular unit at a particular site. This allows us to relate database entities to our business in a very convenient way and also accommodates the way our business will grow.

Secondly, with the TDengine JDBC driver, we were able to reuse and repurpose our existing Java code. The learning curve was very shallow thanks to support for SQL and it did not require a deep understanding of TDengine. This is very convenient for developers who want to get started and also when it comes to training and deployment.

Thirdly, the cost of building a cluster is cheaper. As we all know, the InfluxDB cluster function is closed-source. This essentially means that the basic requirement of scalability and high-availability in a production environment has a significant cost. However, the cluster function of TDengine is open source and there is zero cost to deploy TDengine in a production environment with robust performance, scalability and high-availability.

Fourthly, the performance is outstanding. Compared to our previous system with InfluxDB, even end-users can tell the difference in query latency and response.

Using TDengine

With our historical data visualization function, when the timespan of the query is very large and when the resolution is very high, query latency with InfluxDB was very high. The application would often freeze and the user would have to relaunch their browser. Since we moved to TDengine, the query latency is extremely low, the visualization application is extremely responsive and so the end-user experience is excellent.

TDengine works really well with MyBatis and development was very easy. We have written a lot of utility class methods to create schema and query from TDengine. The novel concept of super tables and sub-tables allows us to have a very simple design and the idea of “one table per device” allows better differentiation between data collection points.

Our business use case involves data ingestion from several data collection points every second. So far the performance of TDengine has been perfectly fine and we expect it to help us scale.

Future Cooperation with TDengine

We are nearly at the end of our development with TDengine and we expect to deploy the new system very soon. We may decide to deploy a cluster in the future and I am sure that we will have to cooperate with the staff of TDengine. Thus far, our choice of TDengine has proven to be very beneficial in terms of easy development and testing and we did not run into any bottlenecks in terms of performance.

While there have been minor ups and downs during our project, I am very grateful to the staff of TDengine who helped me resolve issues quickly.

Our story with TDengine continues, so stay tuned.