Seamless Data Integration from MQTT and InfluxDB to TDengine

Haojun Liao

May 22, 2024

With TDengine Enterprise and TDengine Cloud’s data integration capabilities, you can now seamlessly convert data from MQTT and InfluxDB into TDengine, reducing costs and greatly simplifying data conversion tasks for users. This functionality is similar to Logstash in its implementation and usage. This article will explain this feature by comparing it with Logstash.

Table of Contents

Logstash: Focused on Log Collection and Organization

Logstash is an open-source, real-time data collection and processing engine, typically used as an ETL tool. It collects and transforms data from various sources based on conversion rules and then sends it to specified storage. Logstash is commonly used with ElasticSearch (ES), Kibana, and Beats, forming the Elastic Stack (or ELK Stack), a free and open-source toolset for data collection, enrichment, storage, analysis, and visualization.

Logstash can convert data from sources like Beats, Kafka, and DataSource and write it into databases like ES or MySQL for storage:

Data flow is divided into three main parts: Input, Filter, and Output, covering the entire lifecycle of the data. Both Input and Output support codecs, allowing data to be encoded or decoded when it enters or exits the pipeline without needing a separate filter. Raw data is converted into Events in the Input phase, which are then transformed into target data in the Output phase. The configuration file allows for adding, deleting, modifying, and querying attributes within an Event.

The Filter component is the main reason for Logstash’s powerful functionality. It can perform extensive processing on Logstash Events, such as parsing data, deleting fields, and type conversion. Common filters include:

date: Date parsing
grok: Regex pattern matching and parsing
- Regular expression
  - Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.
  - RegExr: Learn, Build, & Test RegEx
- Grok
  - kibana – grokdebugger
  - https://github.com/elastic/logstash/tree/v1.4.2/patterns
  - https://github.com/logstash-plugins/logstash-patterns-core
  - https://grokdebugger.com/
dissect: Delimiter-based parsing
mutate: Modify event data, including rename, update, replace, convert, split, gsub, uppercase, lowercase, strip, remove_field, join, merge, etc.
json: Parse JSON fields into specified fields
geoip: Add geographic location data
ruby: Use Ruby code to dynamically modify Logstash Events

Example Grok filter configuration:

filter {
    grok => {
        match => {
            "message" => "%{SERVICE:service}"
        }
        pattern_definitions => {
            "SERVICE" => "[a-z0-9]{10,11}"
        }
    }
}

Logstash is suitable for real-time data processing but is not ideal for all scenarios. For large-scale time-series data processing, TDengine offers its own data integration functionality.

TDengine: Data Collection for Multiple Sources

If you use Elasticsearch and Kibana for log management, data analysis, and visualization, Logstash might be a good choice due to its tight integration with Elasticsearch. However, Logstash is not suited for handling large-scale time-series data as it consumes significant system resources, including memory and CPU, and can suffer performance issues with large datasets. Logstash’s scalability can be limited when dealing with massive amounts of data. Moreover, Logstash requires a steep learning curve as it needs correct configuration for each stage of data processing (input, filter, output), and incorrect configurations can lead to data loss or format errors.

TDengine’s data integration functionality addresses these limitations, providing a more convenient and cost-effective solution for data conversion tasks if you are using TDengine.

Using TDengine’s data integration features, you can easily fetch data from MQTT and InfluxDB servers and efficiently write it into the TDengine database, ensuring smooth data integration and analysis. This feature automates the entire data integration process, minimizing manual operations and offering the following benefits:

Support for JSON Format: Leverage the flexibility of JSON, allowing users to ingest and store data in JSON format. Organizations can effectively build and manage data to extract valuable insights from complex data structures.
JSON Path Extraction: TDengine supports JSON path extraction, simplifying the processing of JSON data. By precisely selecting and capturing required data elements, users can focus on core data sets, maximizing analysis efficiency.
Simple Configuration: Provides easy-to-use configuration files where you can specify TDengine’s super tables, sub-tables, columns, and tags, customizing the data integration process to meet specific needs.

After integrating data into TDengine, users can perform data cleaning and transformation based on business needs, achieving a complete ETL process. With these innovative features, real-time data can seamlessly integrate with the high-performance TDengine database, enabling real-time analysis, predictive maintenance, and data-driven decision-making.

The configuration process is straightforward: log in to the TDengine Enterprise or TDengine Cloud web management interface, select “Data in,” and add MQTT as a data source. Simply configure the parsing rules for InfluxDB/MQTT data to correspond to TDengine’s database, super tables, and sub-tables.

TDengine 3.0 Enterprise Edition and TDengine Cloud provide efficient, reliable data integration methods with easy-to-use command-line operations. Whether you want to migrate data from InfluxDB/MQTT or consolidate data from multiple sources into TDengine, TDengine 3.0 Enterprise Edition and TDengine Cloud can meet your needs.

Download TDengine OSS and get started in 60 seconds

Haojun Liao
Haojun Liao is Co-Founder & Query Engine Architect at TDengine and is responsible for the development of query processing component of the product. He has a Ph.D. in Computer Applied Technology from the Institute of Computing Technology (Chinese Academy of Sciences) and is focusing on time series data/spatial data analysis and processing.