IoT applications often require the collection of large volumes of data to support functions such as intelligent control, business analytics, and device monitoring. However, frequent changes in application logic or hardware adjustments can lead to the need for continuous updates in the data collection schema. This presents a significant challenge for time-series databases (TSDBs).
To address this dynamic requirement, TDengine offers a schemaless ingestion mode that allows developers to insert data without having to define the schema in advance. With this approach, the system automatically determines the appropriate data storage structure and adjusts the schema when necessary. If new data columns need to be added, the schemaless mode ensures these updates are handled automatically, enabling accurate data recording. For a more detailed understanding of the primary processing logic, mapping rules, and change handling mechanisms of schemaless ingestion, you can refer to the TDengine technical documentation on the official website.
TDengine has invested significant effort in optimizing its schemaless ingestion functionality to enhance both flexibility and efficiency. This article provides an overview of some of these optimizations, giving developers insights into how they can leverage this functionality to boost application performance. Our goal is to help developers reach new heights in performance optimization.
Performance Optimization Process
Identifying Performance Bottlenecks
By analyzing flame graphs of data ingestion, we can pinpoint the areas where the system is spending the most time. Specifically, functions like parseSmlKey
, parseSmlValue
, addChildTableDataPointsIntoSql
, taos_query_a
, and buildDataPointSchema
are responsible for a high proportion of the processing time.
Addressing Performance Bottlenecks
For each of these functions, we can adopt one of two strategies to improve performance: either eliminate the bottleneck entirely or reduce its duration.
How can we eliminate these bottlenecks?
To answer this question, we first need to understand the current data parsing framework. Below is a simplified flowchart of how the existing framework processes data:
Analysis of the Current Framework
In the current process, each record is iterated over, extracting the measurement (measure), tags, and columns (col) key-value pairs. These are stored in custom structures, and the system sorts the tag keys and generates corresponding sub-table names based on predefined rules. However, the repeated parsing, sorting of tags, and subtable name generation in this workflow add unnecessary time and computational complexity.
Schema Processing
After acquiring the schema metadata for a measurement, the system must check whether the schema needs to be updated. This involves traversing the tags and columns of each data record and determining if operations like create stable
, add col
, add tag
, modify col
, or modify tag
are required.
Data Insertion
For datasets with fewer than 10 records, the system constructs SQL statements individually and inserts them via the taos_query
function. For larger datasets, a batch insert approach is adopted using stmt
structures, which allows for more efficient bulk insertion.
The following code snippets illustrate these key functions:
- Primary Functions:
tscParseLine
,parseSmlKvPairs
,tscSmlInsert
,buildDataPointSchemas
,modifyDBSchemas
,applyDataPointsWithSqlInsert
You can find the detailed code in the following GitHub links:
Now, let’s dive into the optimization of the architecture.
Major Optimizations
- Line Protocol Parsing
Analysis of the data shows that tags with the same prefix often remain consistent. Therefore, we can pre-group the tag strings, placing tags with the same prefix into the same group. Within each group, we only need to parse the tags once, reducing redundant parsing operations.
Additionally, by checking if the data is already ordered, we can bind the data directly in sequence, avoiding the overhead of hash lookups. This reduces computational costs and accelerates the process.
- Schema Processing
During parsing, we should check whether the schema requires modification, such as adding new columns or adjusting column lengths. If no changes are needed, we bypass the schema modification logic, further improving efficiency.
- Data Insertion
Data construction is carried out directly during the parsing of each line, binding the data straight to the final STableDataCxt
structure. This approach eliminates the need for data binding and copying in subsequent stages.
Moreover, we can bypass SQL or stmt
interfaces, directly constructing BuildRow
data to avoid the extra step of second-pass parsing.
Below are code snippets from key functions involved in these optimizations:
- Key Functions:
smlParseInfluxString
,smlParseTagKv
,smlParseColKv
,smlParseLineBottom
,smlModifyDBSchemas
,smlInsertData
You can view the full code in the following GitHub links:
We also identified areas where memory usage could be further optimized during data parsing.
Memory Optimization
In the process of converting data to a schemaless format, numerous memory allocations and copies are made for measurements, tags, and columns. While these steps are common, they represent potential optimization opportunities.
Instead of directly copying and allocating memory during parsing, we can record the pointer locations and lengths of each data item in the original dataset. When the data is ready to be written to the database, we can copy the data from the original dataset based on the recorded pointers.
For example, instead of allocating memory immediately for t1/t2/t3/c1/c3/c2
, we can directly record their pointer locations and perform the memory copy only when needed.
Code Optimization Example:
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4 1626006833639000000
typedef struct {
char* key;
uint8_t type;
uint16_t length;
char* value;
uint32_t fieldSchemaIdx;
} TAOS_SML_KV;
Alternatively, we can use pointers to avoid unnecessary memory allocation:
typedef struct {
char *measure;
char *tags;
char *cols;
char *timestamp;
char *measureTag;
int32_t measureLen;
int32_t measureTagsLen;
int32_t tagsLen;
int32_t colsLen;
int32_t timestampLen;
SArray colArray;
} SSmlLineInfo;
Time Precision Conversion Optimization
By directly accessing memory locations rather than performing conditional checks, we can significantly reduce processing time. For example, when converting time precision, we can avoid checking multiple conditions and instead directly access the appropriate unit.
if(fromPrecision == ){}
else if( fromPrecision == ){}
else {}
int64_t smlToMilli[3] = {3600000LL, 60000LL, 1000LL};
int64_t unit = smlToMilli[fromPrecision - TSDB_TIME_PRECISION_HOURS];
if (unit > INT64_MAX / tsInt64) {
return -1;
}
tsInt64 *= unit;
JSON Parsing Optimization
Since the JSON format for ingestion is usually fixed, we can precompute the offsets for elements like metrics, timestamps, values, and tags. This allows for quicker data handling during parsing.
[
{
"metric": "meter_current",
"timestamp" : 1346846400,
"value": 18,
"tags": {
"location": "LosAngeles",
"id": "d1001"
}
}
]
Conditional Logic Optimization
To minimize entropy in each check, place the most likely if
conditions first. For example: i64 / u64 / i8 / u8 / true / L””
if( str equals "i64"){}
else if( str equals "i32"){}
else if( str equals "u8"){}
···
if(str[0] equals "i"){}
else if(str[0] equals "u"){}
...
Other Optimizations
In some cases, using the likely
and unlikely
keywords can guide the compiler to optimize the instruction pipeline for better performance.
if(unlikely(len == 0 || (len == 1 && data[0] == '0'))){
return taosGetTimestampNs()/smlFactorNS[toPrecision];
}
Additionally, excessive logging, especially at high frequencies, can cause performance degradation. We observed that frequent logging increases processing time by about 10%.
Performance Comparison
Version | SQL | Line Protocol | Telnet Protocol | JSON Protocol |
---|---|---|---|---|
2.6(4ec22e8) | 4543622 | 1458304 | 2161855 | 1272000 |
ver-3.0.0.0 | 1638498 | 1650033 | 1945982 | 800000 |
3.0(f6793d5) | 3740774 | 3602947 | 4328447 | 5520000 |
In the comparison between TDengine 3.0 and 2.6(the speed is measured in records/second), we observed a notable improvement in performance:
- Line Protocol: 2.5x faster
- Telnet: 2x faster
- JSON: 5x faster
Performance Analysis Tools
To analyze performance, we used tools like flame graphs and perf. The perf top -p
command helps us identify high CPU usage areas and optimize them further.
Conclusion
Before undertaking performance optimization, it is essential to thoroughly understand the system’s architecture and flow. Performance improvements should focus on three key areas:
- Architecture Evaluation: Ensure the architecture is designed for optimal performance. It serves as the foundation for any improvements.
- Bottleneck Identification: Address the most significant bottlenecks first and tackle secondary issues as needed.
- Performance Analysis Techniques: Use tools like flame graphs and
perf
to accurately identify areas for improvement.
By considering factors like CPU usage, memory, network, compiler optimizations, and hardware constraints, you can achieve substantial performance gains. We hope this article provides valuable insights for developers working with TDengine’s schemaless ingestion and helps you unlock higher performance in your applications.