In-Depth Exploration of Schemaless ingestion: Optimization Techniques and Practices

Chait Diwadkar

December 19, 2024 / Engineering

IoT applications often require the collection of large volumes of data to support functions such as intelligent control, business analytics, and device monitoring. However, frequent changes in application logic or hardware adjustments can lead to the need for continuous updates in the data collection schema. This presents a significant challenge for time-series databases (TSDBs).

To address this dynamic requirement, TDengine offers a schemaless ingestion mode that allows developers to insert data without having to define the schema in advance. With this approach, the system automatically determines the appropriate data storage structure and adjusts the schema when necessary. If new data columns need to be added, the schemaless mode ensures these updates are handled automatically, enabling accurate data recording. For a more detailed understanding of the primary processing logic, mapping rules, and change handling mechanisms of schemaless ingestion, you can refer to the TDengine technical documentation on the official website.

TDengine has invested significant effort in optimizing its schemaless ingestion functionality to enhance both flexibility and efficiency. This article provides an overview of some of these optimizations, giving developers insights into how they can leverage this functionality to boost application performance. Our goal is to help developers reach new heights in performance optimization.

Performance Optimization Process

Identifying Performance Bottlenecks

By analyzing flame graphs of data ingestion, we can pinpoint the areas where the system is spending the most time. Specifically, functions like parseSmlKey, parseSmlValue, addChildTableDataPointsIntoSql, taos_query_a, and buildDataPointSchema are responsible for a high proportion of the processing time.

Addressing Performance Bottlenecks

For each of these functions, we can adopt one of two strategies to improve performance: either eliminate the bottleneck entirely or reduce its duration.

How can we eliminate these bottlenecks?

To answer this question, we first need to understand the current data parsing framework. Below is a simplified flowchart of how the existing framework processes data:

Analysis of the Current Framework

In the current process, each record is iterated over, extracting the measurement (measure), tags, and columns (col) key-value pairs. These are stored in custom structures, and the system sorts the tag keys and generates corresponding sub-table names based on predefined rules. However, the repeated parsing, sorting of tags, and subtable name generation in this workflow add unnecessary time and computational complexity.

Schema Processing

After acquiring the schema metadata for a measurement, the system must check whether the schema needs to be updated. This involves traversing the tags and columns of each data record and determining if operations like create stable, add col, add tag, modify col, or modify tag are required.

Data Insertion

For datasets with fewer than 10 records, the system constructs SQL statements individually and inserts them via the taos_query function. For larger datasets, a batch insert approach is adopted using stmt structures, which allows for more efficient bulk insertion.

The following code snippets illustrate these key functions:

Primary Functions:tscParseLine, parseSmlKvPairs, tscSmlInsert, buildDataPointSchemas, modifyDBSchemas, applyDataPointsWithSqlInsert

You can find the detailed code in the following GitHub links:

Now, let’s dive into the optimization of the architecture.

Major Optimizations

Line Protocol Parsing

Analysis of the data shows that tags with the same prefix often remain consistent. Therefore, we can pre-group the tag strings, placing tags with the same prefix into the same group. Within each group, we only need to parse the tags once, reducing redundant parsing operations.

Additionally, by checking if the data is already ordered, we can bind the data directly in sequence, avoiding the overhead of hash lookups. This reduces computational costs and accelerates the process.

Schema Processing

During parsing, we should check whether the schema requires modification, such as adding new columns or adjusting column lengths. If no changes are needed, we bypass the schema modification logic, further improving efficiency.

Data Insertion

Data construction is carried out directly during the parsing of each line, binding the data straight to the final STableDataCxt structure. This approach eliminates the need for data binding and copying in subsequent stages.

Moreover, we can bypass SQL or stmt interfaces, directly constructing BuildRow data to avoid the extra step of second-pass parsing.

Below are code snippets from key functions involved in these optimizations:

Key Functions:smlParseInfluxString, smlParseTagKv, smlParseColKv, smlParseLineBottom, smlModifyDBSchemas, smlInsertData

You can view the full code in the following GitHub links:

We also identified areas where memory usage could be further optimized during data parsing.

Memory Optimization

In the process of converting data to a schemaless format, numerous memory allocations and copies are made for measurements, tags, and columns. While these steps are common, they represent potential optimization opportunities.

Instead of directly copying and allocating memory during parsing, we can record the pointer locations and lengths of each data item in the original dataset. When the data is ready to be written to the database, we can copy the data from the original dataset based on the recorded pointers.

For example, instead of allocating memory immediately for t1/t2/t3/c1/c3/c2, we can directly record their pointer locations and perform the memory copy only when needed.

Code Optimization Example:

st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4    1626006833639000000

typedef struct {
    char* key;
    uint8_t type;
    uint16_t length;
    char* value;
    uint32_t fieldSchemaIdx;
} TAOS_SML_KV;

Alternatively, we can use pointers to avoid unnecessary memory allocation:

typedef struct {
    char *measure;
    char *tags;
    char *cols;
    char *timestamp;
    char *measureTag;
    int32_t measureLen;
    int32_t measureTagsLen;
    int32_t tagsLen;
    int32_t colsLen;
    int32_t timestampLen;
    SArray colArray;
} SSmlLineInfo;

Time Precision Conversion Optimization

By directly accessing memory locations rather than performing conditional checks, we can significantly reduce processing time. For example, when converting time precision, we can avoid checking multiple conditions and instead directly access the appropriate unit.

if(fromPrecision == ){}
else if( fromPrecision == ){}
else {}

int64_t smlToMilli[3] = {3600000LL, 60000LL, 1000LL};
int64_t unit = smlToMilli[fromPrecision - TSDB_TIME_PRECISION_HOURS];
if (unit > <strong>INT64_MAX </strong>/ tsInt64) {
  return -1;
}
tsInt64 *= unit;

JSON Parsing Optimization

Since the JSON format for ingestion is usually fixed, we can precompute the offsets for elements like metrics, timestamps, values, and tags. This allows for quicker data handling during parsing.

[
{ 
    "metric": "meter_current",
    "timestamp" : 1346846400,
    "value": 18,
    "tags": {
        "location": "LosAngeles",
        "id": "d1001"
    }     
}
]

Conditional Logic Optimization

To minimize entropy in each check, place the most likely if conditions first. For example: i64 / u64 / i8 / u8 / true / L””

if( str equals "i64"){}
else if( str equals "i32"){}
else if( str equals "u8"){}
···

if(str[0] equals "i"){}
else if(str[0] equals "u"){}
...

Other Optimizations

In some cases, using the likely and unlikely keywords can guide the compiler to optimize the instruction pipeline for better performance.

if(unlikely(<em>len </em>== 0 || (<em>len </em>== 1 && <em>data</em>[0] == '0'))){
  return taosGetTimestampNs()/smlFactorNS[toPrecision];
}

Additionally, excessive logging, especially at high frequencies, can cause performance degradation. We observed that frequent logging increases processing time by about 10%.

Performance Comparison

Version	SQL	Line Protocol	Telnet Protocol	JSON Protocol
2.6(4ec22e8)	4543622	1458304	2161855	1272000
ver-3.0.0.0	1638498	1650033	1945982	800000
3.0(f6793d5)	3740774	3602947	4328447	5520000

In the comparison between TDengine 3.0 and 2.6(the speed is measured in records/second), we observed a notable improvement in performance:

Line Protocol: 2.5x faster
Telnet: 2x faster
JSON: 5x faster

Performance Analysis Tools

To analyze performance, we used tools like flame graphs and perf. The perf top -p command helps us identify high CPU usage areas and optimize them further.

Conclusion

Before undertaking performance optimization, it is essential to thoroughly understand the system’s architecture and flow. Performance improvements should focus on three key areas:

Architecture Evaluation: Ensure the architecture is designed for optimal performance. It serves as the foundation for any improvements.
Bottleneck Identification: Address the most significant bottlenecks first and tackle secondary issues as needed.
Performance Analysis Techniques: Use tools like flame graphs and perf to accurately identify areas for improvement.

By considering factors like CPU usage, memory, network, compiler optimizations, and hardware constraints, you can achieve substantial performance gains. We hope this article provides valuable insights for developers working with TDengine’s schemaless ingestion and helps you unlock higher performance in your applications.

Chait Diwadkar

Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.