TDengine data migration tool based on DataX

One of our priorities at TDengine is to provide tools to developers that will allow them to focus on writing code that provides value to their users. Developers should not have to waste their time for e.g. to write code to migrate data. So we have implemented a data migration tool to migrate data from different data sources such as OpenTSDB and MySQL to TDengine. The purpose of this article is to give users a quick overview of how to use this data migration tool.

1. Introduction

We have extended the wonderful open-source tool, DataX and implemented two plug-ins, TDengineReader and TDengineWriter.

Features provided by TDengineReader:

  1. Support data filtering through sql.
  2. Divide tasks according to time intervals.
  3. Support all data types of TDengine.
  4. Support batch reading, and control the size of the batch pull result set through the batchSize parameter to improve read performance.

Features supported by TDengineWriter:

  1. It supports OpenTSDB‘s JSON format line protocol, and uses TDengine’s schema-less method to write to TDengine.
  2. Support batch writing, control the number of batch writes through the batchSize parameter, and improve write performance.

2. Implementation Principle

TDengineReader: Uses JNI to pull data from TDengine.

TDengineWriter: Uses JNI to write data to TDengine. Additionally it uses schema-less writing for OpenTSDB and batch statement writes for relational databases such as MySQL.

3. How to Use

3.1 Prepare the environment

(1) Install the TDengine client which can be downloaded here – https://tdengine.com/all-downloads#tdengine_win. Scroll down the page to see clients for non-Windows OS.

(2) Install JDK 1.8 (DataX run-time requirement)

(3) Install a Python environment (DataX run-time requirement)

(4) Install the Maven compilation environment (Required to compile DataX)

3.2 Installation

Download the source code

git clone https://github.com/taosdata/DataX.git

Compile and Package

cd DataX
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

Install

cp target/datax.tar.gz your_install_dir
cd your_install_dir
tar -zxvf datax.tar.gz

3.3 How to Configure a Data Migration Job

3.3.1 Migration configuration of time series data

This is an example of a data migration task from OpenTSDB to TDengine and the configuration file opentsdb2tdengine.json looks as follows:

{
“job”:{
“content”:[{
“reader”: {
“name”: “opentsdbreader”,
“parameter”: {
“endpoint”: “http://192.168.1.180:4242”,
“column”: [
“weather_temperature”
],
“beginDateTime”: “2021-01-01 00:00:00”,
“endDateTime”: “2021-01-01 01:00:00”
}
},
“writer”: {
“name”: “tdenginewriter”,
“parameter”: {
“username”: “root”,
“password”: “taosdata”,
“connection”: [
{
“table”: [
“weather”
],
“jdbcUrl”: “jdbc:TAOS://192.168.1.101:6030/test?timestampFormat=TIMESTAMP”
}
],
“batchSize”: 1000
}
}
}],
“setting”: {
“speed”: {
“channel”: 1
}
}
}
}

Configuration instructions:

  • The above configuration represents the migration from an OpenTSDB instance at 192.168.1.180 to a TDengine instance running at 192.168.1.101. The migration metric is weather_temperature, and the time starts from 2021-01-01 00:00:00 and ends at 2021-01-01 01:00:00.
  • The reader uses DataX’s opentsdbreader. For parameter configuration, please refer to: opentsdbreader.md#Configuration parameters
  • In the parameters of tdenginewriter, jdbcUrl, username, and password are all required items, and there is no default value. Detailed reference: tdenginewriter.md#Configuration parameters
  • Caution – In TDengine, if the table specified by “weather” does not exist, the table needs to be created before migration.

3.3.2 Migration configuration of relational data

This is an example of a configuration for a migration from MySQL to TDengine:

{
“job”: {
“content”: [
{
“reader”: {
“name”: “mysqlreader”,
“parameter”: {
“username”: “root”,
“password”: “123456”,
“column”: [
“ts”, “dt”, “f1”, “f2”, “f3”, “f4”, “f5”, “f6”, “f7”, “f8”
],
“splitPk”: “id”,
“connection”: [
{
“table”: [
“stb1”
],
“jdbcUrl”: [
“jdbc:mysql://192.168.1.101:3306/db1?useSSL=false&useUnicode=true&characterEncoding=utf8”
]
}
]
}
},
“writer”: {
“name”: “tdenginewriter”,
“parameter”: {
“username”: “root”,
“password”: “taosdata”,
“column”: [
“ts”, “dt”, “f1”, “f2”, “f3”, “f4”, “f5”, “f6”, “f7”, “f8”
],
“connection”: [
{
“table”: [
“stb2”
],
“jdbcUrl”: “jdbc:TAOS://192.168.1.105:6030/db2”
}
],
“batchSize”: 1000,
“ignoreTagsUnmatched”: true
}
}
}
],
“setting”: {
“speed”: {
“channel”: 1
}
}
}
}

Configuration instructions:

  • The above configuration indicates the data migration tool from a MySQL instance at 192.168.1.101 to a TDengine instance at 192.168.1.105. 
  • The reader uses DataX’s mysqlreader. For parameter configuration, please refer to: mysqlreader.md

3.3.3 Migration configuration between TDengine instances

Taking a data migration from TDengine to TDengine as an example, the configuration file tdengine2tdengine.json is as follows:

{
“job”: {
“content”: [
{
“reader”: {
“name”: “tdenginereader”,
“parameter”: {
“username”: “root”,
“password”: “taosdata”,
“connection”: [
{
“table”: [
“stb1”
],
“jdbcUrl”: “jdbc:TAOS://192.168.1.82:6030/db1?timestampFormat=TIMESTAMP”
}
],
“column”: [
“tbname”, “ts”, “f1”, “f2”, “f3”, “f4”, “f5”, “f6”, “f7”, “f8”, “f9”, “t1”, “t2”, “t3”, “t4”, “t5”, “t6”, “t7”, “t8”, “t9”, “t10”
],
“beginDateTime”: “2022-02-15 00:00:00”,
“endDateTime”: “2022-02-16 00:00:00”,
“splitInterval”: “1d”
}
},
“writer”: {
“name”: “tdenginewriter”,
“parameter”: {
“username”: “root”,
“password”: “taosdata”,
“column”: [
“tbname”, “ts”, “f1”, “f2”, “f3”, “f4”, “f5”, “f6”, “f7”, “f8”, “f9”, “t1”, “t2”, “t3”, “t4”, “t5”, “t6”, “t7”, “t8”, “t9”, “t10”
],
“connection”: [
{
“table”: [
“stb2”
],
“jdbcUrl”: “jdbc:TAOS-RS://192.168.1.105:6041/db2?timestampFormat=TIMESTAMP”
}
],
“batchSize”: 1000,
“ignoreTagsUnmatched”: true
}
}
}
],
“setting”: {
“speed”: {
“channel”: 1
}
}
}
}

Configuration instructions:

  • The above configuration represents data migration between TDengine from 192.168.1.82 to 192.168.1.105. 
  • The reader uses tdenginereader. For parameter configuration, please refer to: tdenginereader.md#Configuration parameters

3.4 Execute the migration task

Save the configuration file written above in the datax/job directory, and execute the following command to start the data migration task:

python bin/datax.py job/opentsdb2tdengine.json

4. Restrictions

(1) Currently, DataX’s own opentsdbreader only supports openTSDB-2.3.X version. Detailed reference: opentsdbreader#constraint limit

(2) The data migration tool relies on libtaos.so / taos.dll / libtaos.dylib in the TDengine client if you use JDBC-JNI in the “jdbcUrl” paratemeter.

5. FAQ About data migration tool

(1) How to estimate the resources required for a data migration task

Each reader of DataX divides tasks according to its own task segmentation strategy. For details, please refer to the task scheduling rules of DataX. When estimating resources, it is necessary to comprehensively consider the amount of data to be migrated, task segmentation rules, and network bandwidth limitations. It is best to take the actual data migration test results as the criterion.

(2) How efficient is the batchSize setting of TDengineWriter?

batchSize is a parameter that controls batch writing. After obtaining batchSize row records, TDengineWriter will send a write request to TDengine, which reduces interaction with TDengine and improves performance. From our test results, batchSize is the most efficient in the range of 500-1000.

(3) What is the appropriate number of channels in the job configuration?

The number of channels in a job is a parameter of flow control, and each channel needs to open up a piece of memory to cache data. If the channel setting is too large, it will cause OOM (Out of Memory) error and so the number of channels should not necessarily be as large as possible. If the number of channels is increased, the JVM memory size should also be increased. From our test results, the channel is suitable in the range of 1 to 6 to ensure that the flow of DataX is maximized.