Mastering Python UDFs in TDengine: From Beginner to Pro

Chait Diwadkar

January 2, 2025 / Engineering

With the release of TDengine 3.0.4.0, we unveiled an exciting new feature: the ability to create User-Defined Functions (UDFs) using Python. This innovative capability delivers unparalleled flexibility for database operations while making customization more accessible—even for those new to programming. With Python UDFs, users can easily tailor and manage their databases, seamlessly integrating these custom functions into SQL queries as if they were built-in. In this article, we’ll explore how to leverage this feature effectively and help you take the first steps toward creating your own database customizations.

Getting Started with UDFs

Creating a UDF

Here’s how to create a Python UDF function in TDengine:

       CREATE  [OR REPLACE] [AGGREGATE] FUNCTION function_name
                               as library_path  OUTPUTTYPE output_type  [BUFSIZE buffer_size] [LANGUAGE 'C‘ | 'Python']

Options Explained:

CREATE [OR REPLACE]: Use CREATE for new functions and add OR REPLACE to update existing ones.
AGGREGATE: Optional. Indicates an aggregate function. Without it, the function defaults to a scalar function.
function_name: The name of the UDF, which can be up to 64 bytes long. Names exceeding this limit will be truncated.
OUTPUTTYPE: Specifies the output data type of the UDF. Supported types include:

Serial Number	Supported Data Type
1	TIMESTAMP
2	INT
3	INT UNSIGNED
4	BIGINT
5	BIGINT UNSIGNED
6	FLOAT
7	DOUBLE
8	BINARY
9	SMALLINT
10	SMALLINT UNSIGNED
11	TINYINT
12	TINYINT UNSIGNED
13	BOOL
14	NCHAR

BUFSIZE: Sets the memory buffer size for aggregate functions (up to 256 KB). This buffer is allocated for the lifetime of the aggregation process, making it useful as a global variable.
LANGUAGE: Specifies the programming language. Currently, TDengine supports Python and C.

Deleting a UDF

To remove a UDF, use the following command:

DROP FUNCTION function_name;

Viewing UDFs

To view all created UDFs:

SHOW FUNCTIONS;

For detailed information:

SELECT * FROM information_schema.ins_functions\G;

Setting Up the Environment

Before diving into Python UDF development, ensure your environment is ready:

CMake: Minimum version 3.0.2.
GCC: Required for compiling the shared library (.so file). Minimum version: 7.5.
Python: Version 3.7 or higher.

Install the UDF plugin with the following command:

python3 -m pip install taospyudf
ldconfig

Your development environment is now ready!

Writing UDFs

UDFs in TDengine fall into two categories: scalar and aggregate functions. Before diving into the details of the functions, let’s first take a look at the process of how custom functions are invoked:

How UDFs Work

TDengine processes data in blocks for efficiency. When a UDF is invoked, the input data is a block, and you can access any cell in the block using the data(row, col) method. This approach minimizes the overhead of calls between the C framework and Python, boosting performance.

Scalar Functions: Must return the same number of rows as the input.
Aggregate Functions: Aggregate data and return a single row.

Creating a Scalar Function

Let’s create a Python UDF equivalent of the CONCAT string concatenation function:

Write the Function

The function must include the following methods:

init: Called once during the initialization of the UDF module. Use this to perform any necessary setup tasks.
destroy: Called once when the UDF module exits. Use this for cleanup tasks.
process: The main function that processes each incoming data block. Use the shape() method to retrieve the number of rows and columns in the data block.
- nrows: Returns the number of rows in the data block.
- ncols: Returns the number of columns, which corresponds to the number of parameters passed to the concat() function.

Return Values:

A scalar function must return a list. If the return value is not a list, an error will occur.
The number of elements in the returned list must match the number of rows (nrows) in the data block. Mismatched row counts will result in an error.

Create the function

create function py_concat as '/home/py_concat.py' outputtype varchar(256) language 'Python';

Execute the function Use the UDF like any built-in function:

select sf_concat(factory_name,room_name), concat(factory_name,room_name) from devices;

Creating an Aggregate Function

Aggregate functions perform computations on data and output a single aggregated result. Let’s create a Python UDF for counting rows:

Write the Function

Implementation Principles:

During the start initialization callback, the value 0 is stored in the buffer (buf) as the initial value for counting.
In the reduce function, as data blocks are processed, any non-None values are added to the cumulative count. The result is stored back into the buffer (buf), which is passed as a parameter during subsequent calls to reduce. This allows the buffer to be reused across iterations.
Finally, in the finish function, the buffer (buf) is passed as a parameter. The value stored in the buffer is retrieved and returned as the final result, representing the total count.

Return Value:

The return type of the function must match the OUTPUTTYPE specified when the UDF function was created. If the return type is incorrect, an error will be triggered.
Returning a None object is allowed, but it must be handled appropriately.

Create the UDF

create aggregate function af_count as ''/home/af_count.py'' outputtype bigint bufsize 4096 language 'Python';

Execute the UDF

select af_count(col1) from devices;

Data Type Mappings

Python interacts with C through specific data type mappings. Key points include:

TDengine Data Type	Mapped to Python Object
TINYINT / TINYINT UNSIGNED / SMALLINT / SMALLINT UNSIGNED / INT / INT UNSIGNED / BIGINT / BIGINT UNSIGNED	int
FLOAT / DOUBLE	float
BOOL	bool
BINARY / NCHAR / VARCHAR	bytes
TIMESTAMP	int
JSON	Not supported

Binary, NCHAR, and VARCHAR map to Python’s bytes object. Use appropriate decoding:
1. For binary: bytes.decode('utf-8')
2. For nchar: bytes.decode('utf_32_le')
When returning a str object:
1. For binary: str.encode('utf-8')
2. For nchar: str.encode('utf_32_le')

Development Tips

Updating Function Code

The .py file path specified during UDF creation is only used at that moment. The file’s content is stored in the mnode, making the function accessible across the cluster. After creation, the .py file is no longer needed.

To update the UDF code, you must update mnode with the new content. Use the OR REPLACE option to overwrite the existing function and apply the updated code for future calls.

CREATE [OR REPLACE] FUNCTION function_name AS library_path OUTPUTTYPE output_type [LANGUAGE 'C' | 'Python'];

Logging

Python UDFs in TDengine do not support direct debugging of Python code but do support logging. You can use commonly used libraries like logging within UDF functions. It is recommended to output logs to a file for review. Note that messages printed using the print function will not be visible, so avoid using it for output. For example:

Exception Handling

Raise exceptions to terminate queries if needed. Exceptions are logged in taosudfpy.log.

Checking UDF Framework Logs

If the UDF framework returns an error, check the log files in the TDengine log directory for details:

taospyudf.log Logs Python UDF activity, including function loading, execution errors, and exceptions. Use this log when debugging Python UDFs.
udfdlog.0 Logs framework-level issues for all UDF languages (C, Python, etc.). Check this log for broader framework errors, though it’s rarely needed.

Common Errors and Solutions

Serial Number	Error Code	Error Description	Explanation
1	0x80002901	udf is stopping	The UDF framework is shutting down and cannot provide external services.
2	0x80002902	udf pipe read error	A data read error occurred during communication between the taosd and udfd processes.
3	0x80002903	udf pipe connect error	A connection error occurred during communication between the taosd and udfd processes.
4	0x80002904	udf no pipe	Failed to create a pipe during communication between the taosd and udfd processes.
5	0x80002905	udf load failure	Failed to load the UDF module.
6	0x80002906	udf invalid state	The UDF framework is in an invalid state (e.g., during initialization) and cannot provide external services.
7	0x80002907	udf invalid function input	Invalid parameters were provided to the UDF function.
8	0x80002908	udf no function handle	Internal error: no function handle was found.
9	0x80002909	udf invalid bufsize	The BUFSIZE specified for an aggregate function was invalid or exceeded the limit.
10	0x8000290A	udf invalid output type	The actual data type does not match the OUTPUTTYPE specified when creating the UDF function.
11	0x8000290B	udf program language not supported	The specified programming language is not supported.
12	0x8000290C	udf function execution failure	The UDF function encountered an error during execution. Check logs for more details.
13	0x8000290D	udf raise exception	The UDF function raised an exception or terminated explicitly, causing the query to stop.

Errors 10 and 12 are the most common issues when developing Python UDFs. For detailed causes, refer to the taospyudf.log file.

Examples and Resources

TDengine’s open-source repository provides several Python UDF test examples for reference:

Examples Repository: https://github.com/taosdata/TDengine/tree/3.0/tests/system-test/0-others/udfpy
CI Test Case: https://github.com/taosdata/TDengine/blob/3.0/tests/system-test/0-others/udfpy_main.py

Steps to Run Test Cases:

Ensure Python3 is correctly installed.
Clone the TDengine OSS code.
Navigate to the TDengine/tests/system-test/ directory.
Run the test case using:

python3 test.py -f others/udfpy_main.py

Notes:

1. Common Issues with UDF Scalar Functions

Row Count Mismatch: The number of rows returned by a scalar function must match the number of input rows.
- If the logic is complex, it’s easy to miss rows during processing. Be cautious to avoid such errors.

2. `OUTPUTTYPE` Mismatch

Output Type Consistency: The return type of the UDF must match the OUTPUTTYPE specified during its creation.
- Mismatched types will result in an error.
Scalar Functions: The return value of the process function must be a list, and all elements in the list must match the data type specified by OUTPUTTYPE.
Aggregate Functions: The final return value in the finish function must match the OUTPUTTYPE as a Python data type.

3. Error: Unable to Load `libtaospyudf.so`

Check Installation: Ensure the taospyudf plugin was installed properly.
- The library is typically installed in /usr/local/lib/libtaospyudf.so or Python’s plugin directory.
- Use the command below to locate the file:

find / -name 'libtaospyudf.so'

Missing File: If the file is missing, the plugin installation may have failed. Reinstall the plugin to resolve the issue.
File Exists but Still Fails:
- Use ldd to check the library’s dependencies:

ldd /usr/local/lib/libtaospyudf.so

Missing Python dependencies are a common issue. For example, an improperly installed Python environment can cause errors.

Solution: Reinstall Python (e.g., Python 3.9) correctly to resolve the dependency issue.

Conclusion

The Python UDF feature in TDengine is an exciting step forward, offering unmatched customization for your database. This is just the beginning—we’re continuously improving and expanding its capabilities. We invite you to explore this feature, unleash your creativity, and share your feedback to make it even better!

Chait Diwadkar
Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.