TDengine Data Migration Tool Based on DataX

Chait Diwadkar

April 22, 2022 / Engineering

One of our priorities at TDengine is to provide tools to developers that allow them to focus on writing code that provides value to their users instead of on administrative tasks such as data migration. To this end, the team has created a tool to migrate data from OpenTSDB and MySQL to TDengine. The purpose of this article is to give users a quick overview of how to use this data migration tool.

Introduction

We have extended the open-source tool DataX for use with our time-series database (TSDB) and implemented two plug-ins: TDengine30Reader and TDengine30Writer.

Features provided by TDengine30Reader:

Data filtering through SQL
Task division based on time intervals
Support for all TDengine data types
Batch reads with configurable batch size

Features supported by TDengine30Writer:

Schemaless writes into TDengine in OpenTSDB JSON format
Batch writes with configurable batch size

Implementation

The TDengine30Reader and TDengine30Writer plug-ins interact with TDengine via the Java Native Interface (JNI). Schemaless writing is used for OpenTSDB data sources, while batch statement writing is used for MySQL data sources.

Usage

Prerequisites

Install the TDengine Client.
Install JDK 1.8.
Install Python.
If you are building DataX from source, install Apache Maven.

Installation

Run the following command to obtain the source code:
git clone https://github.com/taosdata/DataX.git
Run the following commands to compile the source code:
cd DataX mvn -U clean package assembly:assembly -Dmaven.test.skip=true
Run the following commands to install DataX:
cp target/datax.tar.gz <your-install-dir> cd <your-install-dir> tar -zxvf dataX.tar.gz

Configuring a Migration Job

Time-Series DBMS Migration

To migrate data from OpenTSDB to TDengine 3.0, configure the opentsdb2tdengine.json file as follows:

{
   "job":{
     "content":[{
       "reader": {
         "name": "opentsdbreader",
         "parameter": {
           "endpoint": "http://192.168.1.180:4242",
           "column": ["weather_temperature"],
           "beginDateTime": "2021-01-01 00:00:00",
          "endDateTime": "2021-01-01 01:00:00"
        }
      },
    "writer": {
      "name": "tdengine30writer",
       "parameter": {
            "username": "root",
            "password": "taosdata",
            "connection": [
              {
                "table": [
                  "matric1"
                ],
                "jdbcUrl": "jdbc:TAOS://192.168.1.101:6030/test?timestampFormat=TIMESTAMP"
              }
            ],
            "batchSize": 1000,
            "ignoreTagsUnmatched": true
          }
      }
    }],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

Notes:

This configuration migrates data from an OpenTSDB instance located at 192.168.1.80 to a TDengine instance located at 192.168.1.101. The weather_temperature metric is migrated from 2021-01-01 00:00:00 to 2021-01-01 01:00:00.
The opentsdbreader component of DataX is used to perform the migration. For configuration information, see the documentation.
In the tdengine30writer configuration, the user and password parameters are mandatory and do not have a default value. The batchSize parameter is optional and defaults to 1. For more information, see the documentation.
If the database specified by the dbname parameter does not exist in TDengine, you must create it before performing the migration.

RDBMS Migration

To migrate data from MySQL to TDengine 3.0, configure the mysql2tdengine.json file as follows:

{
   "job": {
     "content": [{
       "reader": {
         "name": "mysqlreader",
         "parameter": {
           "username": "root",
           "password": "root",
           "column": ["id","name"],
          "splitPk": "id",
          "connection": [{
            "table": ["test"],
            "jdbcUrl": ["jdbc:mysql://192.168.1.101:3306/db"]
          }]
        }
      },
      "writer": {
        "name": "tdengine30writer",
        "parameter": {
          "host": "192.168.1.105",
          "port": 6030,
          "dbname": "test",
          "user": "root",
          "password": "taosdata",
          "batchSize": 1000
        }
      }
    }],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

Notes:

This configuration migrates data from a MySQL instance located at 192.168.1.101 to a TDengine instance located at 192.168.1.105. The id and name columns in the test database are migrated to TDengine using the id column to divide jobs.
The mysqlreader component of DataX is used to perform the migration. For configuration information, see the documentation.

TDengine Migration

To migrate data from a TDengine instance to another TDengine instance, configure the tdengine2tdengine.json file as follows:

{
   "job": {
     "content": [{
       "reader": {
         "name": "tdengine30reader",
         "parameter": {
           "host": "192.168.1.82",
           "port": 6030,
           "db": "test",
          "user": "root",
          "password": "taosdata",
          "sql": "select * from weather",
          "beginDateTime": "2021-01-01 00:00:00",
          "endDateTime": "2021-01-02 00:00:00",
          "splitInterval": "1h"
        }
      },
      "writer": {
        "name": "tdengine30writer",
        "parameter": {
          "host": "192.168.1.105",‘
          "port": 6030,
          "dbname": "test",
          "user": "root",
          "password": "taosdata",
          "batchSize": 1000
        }
      }
    }],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

Notes:

This configuration migrates data from a TDengine instance located at 192.168.1.82 to another TDengine instance located at 192.168.1.105. The tdenginereader plug-in filters data based on the values of the beginDateTime and endDateTime parameters and divides jobs based on the value of the splitInterval parameter.
The tdengine30reader component of DataX is used to perform the migration. For configuration information, see the documentation.

Run a Migration Job

Copy the prepared configuration files into the datax/job directory and run the following command:
python bin/datax.py job/opentsdb2tdengine.json

Limitations

The opentsdbreader in DataX only supports OpenTSDB 2.3.x. For more information, see the documentation.
The migration tool has a dependency on libtaos.so, taos.dll, and libtaos.dylib from the TDengine Client. You must install the TDengine Client and ensure that its version is compatible with your TDengine Server.

FAQ

How can I estimate the resources required to migrate my data?

Each reader in DataX divides jobs based on its own policy. For more information, see the DataX job scheduling rules. To estimate required resources, consider the amount of data to be migrated, the job division policy, and available network bandwidth
What is the most efficient value of the batchSize parameter for the TDengine30Writer plug-in?

The batchSize parameter controls the size of each batch that is written to TDengine. TDengineWriter obtains a number of records equal to the value of batchSize and sends these as a single request to the TDengine instance. This decreases the number of interactions with the TDengine instance and thereby improves performance. Based on our testing, the optimal value of batchSize is between 500 and 1000.
What is the optimal number of channels for a job?

The number of channels for a job controls the traffic that it generates. Each channel requires a block of memory for data caching. An overly large number of channels may cause out-of-memory errors. For this reason it is not recommended to specify an excessive number of channels for your job. After increasing the number of channels, you must increase the memory assigned to the JVM as well. Based on our testing, the optimal number of channels is between 1 and 6.
How can I resolve the following error?
java.sql.SQLException: TDengine ERROR (8000060b): Invalid client value

This error indicates that the tbname parameter has not been set for the column in the configuration file. This triggers line protocol ingestion, which automatically creates subtables but cannot automatically create supertables. In addition, line protocol ingestion supports only the NCHAR data type for tags. To resolve this issue, either change all tags to the NCHAR data type or specify a table name for the column so that line protocol ingestion is not triggered.
How can I resolve the following error?
java.sql.SQLException: TDengine ERROR (8000060b): Timestamp data out of range

If the tbname parameter has not been set for the column, which triggers line protocol ingestion, and all tags are of the NCHAR data type, the timestamp column must be named _ts. If the timestamp column has any other name, this error is thrown. To use another name for the timestamp column, you must specify a table name so that line protocol ingestion is not triggered.