UDF overview and classification
A time-series database (TSDB) can support User-Defined Functions (UDFs) for specialized computation that is not covered by built-in functions. Once registered in the cluster, UDFs can be called directly in SQL like native functions.
UDFs fall into two types:
- Scalar functions: output one value per input row, such as data type conversions or mathematical operations
- Aggregate functions: output one value across multiple input rows, such as SUM or AVG
UDFs support two programming languages: C and Python. UDFs written in C can deliver performance close to built-in functions, making them a good choice for performance-sensitive scenarios. Python UDFs can draw on Python’s library ecosystem to implement complex algorithms quickly.
Process isolation for safety
To prevent UDF execution anomalies from affecting the database service, the system uses process isolation: UDF execution runs in a separate process. If UDF code leaks memory or crashes, process isolation helps protect the core database service and overall system stability.
C language UDF interface specification
Scalar function interfaces require implementing the scalarfn interface function. It receives one row of input data and returns one output value. Developers only need to focus on the computation logic; the framework handles data reading and writing automatically.
Aggregate function interfaces require three interface functions that form the complete aggregation lifecycle:
aggfn_start: initializes the aggregation state, called at the start of aggregationaggfn: processes each input row and updates the aggregation stateaggfn_finish: outputs the final aggregation result
The initialization and cleanup lifecycle also requires two functions:
udf_init: called when the UDF is loadedudf_destroy: called when the UDF is unloaded, for resource cleanup
Compilation and deployment
After writing the C language UDF source code, compile it into a shared library. Example compilation command:
gcc -g -O0 -fPIC -shared bit_and.c -o libbitand.so
Compilation flags explained:
-g: generate debug information-O0: disable optimization, useful for debugging-fPIC: generate position-independent code (required for shared libraries)-shared: produce a shared library file
Use GCC 7.5 or later for compatibility. After compilation, deploy the .so file to the specified path on the database server.
Registration and usage
After deploying the shared library, register the UDF with a SQL statement:
CREATE AGGREGATE FUNCTION max_vol AS '/root/udf/libmaxvol.so' OUTPUTTYPE BINARY(64) BUFSIZE 10240 LANGUAGE 'C'
The registration statement specifies the function name max_vol (the name used in SQL queries), the library file path, the output type BINARY(64), the buffer size 10240, and the programming language C. Once registered, the UDF can be called in SQL queries like a built-in function.
Summary
The UDF mechanism gives TDengine TSDB a flexible extension path for specialized computation. C UDFs are suited to performance-sensitive logic, while process isolation helps reduce risk to the core database service. The workflow from interface implementation to compilation, deployment, registration, and invocation gives developers a clear path for adding custom SQL functions.


