With the release of TDengine 3.0.4.0, we unveiled an exciting new feature: the ability to create User-Defined Functions (UDFs) using Python. This innovative capability delivers unparalleled flexibility for database operations while making customization more accessible—even for those new to programming. With Python UDFs, users can easily tailor and manage their databases, seamlessly integrating these custom functions into SQL queries as if they were built-in. In this article, we’ll explore how to leverage this feature effectively and help you take the first steps toward creating your own database customizations.
Getting Started with UDFs
Creating a UDF
Here’s how to create a Python UDF function in TDengine:
CREATE [OR REPLACE] [AGGREGATE] FUNCTION function_name
as library_path OUTPUTTYPE output_type [BUFSIZE buffer_size] [LANGUAGE 'C‘ | 'Python']
Options Explained:
CREATE [OR REPLACE]
: UseCREATE
for new functions and addOR REPLACE
to update existing ones.AGGREGATE
: Optional. Indicates an aggregate function. Without it, the function defaults to a scalar function.function_name
: The name of the UDF, which can be up to 64 bytes long. Names exceeding this limit will be truncated.OUTPUTTYPE
: Specifies the output data type of the UDF. Supported types include:
Serial Number | Supported Data Type |
---|---|
1 | TIMESTAMP |
2 | INT |
3 | INT UNSIGNED |
4 | BIGINT |
5 | BIGINT UNSIGNED |
6 | FLOAT |
7 | DOUBLE |
8 | BINARY |
9 | SMALLINT |
10 | SMALLINT UNSIGNED |
11 | TINYINT |
12 | TINYINT UNSIGNED |
13 | BOOL |
14 | NCHAR |
BUFSIZE
: Sets the memory buffer size for aggregate functions (up to 256 KB). This buffer is allocated for the lifetime of the aggregation process, making it useful as a global variable.LANGUAGE
: Specifies the programming language. Currently, TDengine supports Python and C.
Deleting a UDF
To remove a UDF, use the following command:
DROP FUNCTION function_name;
Viewing UDFs
To view all created UDFs:
SHOW FUNCTIONS;
For detailed information:
SELECT * FROM information_schema.ins_functions\G;
Setting Up the Environment
Before diving into Python UDF development, ensure your environment is ready:
- CMake: Minimum version 3.0.2.
- GCC: Required for compiling the shared library (
.so
file). Minimum version: 7.5. - Python: Version 3.7 or higher.
Install the UDF plugin with the following command:
python3 -m pip install taospyudf
ldconfig
Your development environment is now ready!
Writing UDFs
UDFs in TDengine fall into two categories: scalar and aggregate functions. Before diving into the details of the functions, let’s first take a look at the process of how custom functions are invoked:
How UDFs Work
TDengine processes data in blocks for efficiency. When a UDF is invoked, the input data is a block, and you can access any cell in the block using the data(row, col)
method. This approach minimizes the overhead of calls between the C framework and Python, boosting performance.
- Scalar Functions: Must return the same number of rows as the input.
- Aggregate Functions: Aggregate data and return a single row.
Creating a Scalar Function
Let’s create a Python UDF equivalent of the CONCAT
string concatenation function:
- Write the Function
The function must include the following methods:
init
: Called once during the initialization of the UDF module. Use this to perform any necessary setup tasks.destroy
: Called once when the UDF module exits. Use this for cleanup tasks.process
: The main function that processes each incoming data block. Use theshape()
method to retrieve the number of rows and columns in the data block.nrows
: Returns the number of rows in the data block.ncols
: Returns the number of columns, which corresponds to the number of parameters passed to theconcat()
function.
Return Values:
- A scalar function must return a list. If the return value is not a list, an error will occur.
- The number of elements in the returned list must match the number of rows (
nrows
) in the data block. Mismatched row counts will result in an error.
- Create the function
create function py_concat as 'https://eujqw4hwudm.exactdn.com/home/py_concat.py' outputtype varchar(256) language 'Python';
- Execute the function Use the UDF like any built-in function:
select sf_concat(factory_name,room_name), concat(factory_name,room_name) from devices;
Creating an Aggregate Function
Aggregate functions perform computations on data and output a single aggregated result. Let’s create a Python UDF for counting rows:
- Write the Function
Implementation Principles:
- During the
start
initialization callback, the value0
is stored in the buffer (buf
) as the initial value for counting. - In the
reduce
function, as data blocks are processed, any non-None
values are added to the cumulative count. The result is stored back into the buffer (buf
), which is passed as a parameter during subsequent calls toreduce
. This allows the buffer to be reused across iterations. - Finally, in the
finish
function, the buffer (buf
) is passed as a parameter. The value stored in the buffer is retrieved and returned as the final result, representing the total count.
Return Value:
- The return type of the function must match the
OUTPUTTYPE
specified when the UDF function was created. If the return type is incorrect, an error will be triggered. - Returning a
None
object is allowed, but it must be handled appropriately.
- Create the UDF
create aggregate function af_count as ''https://eujqw4hwudm.exactdn.com/home/af_count.py'' outputtype bigint bufsize 4096 language 'Python';
- Execute the UDF
select af_count(col1) from devices;
Data Type Mappings
Python interacts with C through specific data type mappings. Key points include:
TDengine Data Type | Mapped to Python Object |
---|---|
TINYINT / TINYINT UNSIGNED / SMALLINT / SMALLINT UNSIGNED / INT / INT UNSIGNED / BIGINT / BIGINT UNSIGNED | int |
FLOAT / DOUBLE | float |
BOOL | bool |
BINARY / NCHAR / VARCHAR | bytes |
TIMESTAMP | int |
JSON | Not supported |
- Binary, NCHAR, and VARCHAR map to Python’s
bytes
object. Use appropriate decoding:- For binary:
bytes.decode('utf-8')
- For nchar:
bytes.decode('utf_32_le')
- For binary:
- When returning a
str
object:- For binary:
str.encode('utf-8')
- For nchar:
str.encode('utf_32_le')
- For binary:
Development Tips
Updating Function Code
The .py file path specified during UDF creation is only used at that moment. The file’s content is stored in the mnode, making the function accessible across the cluster. After creation, the .py file is no longer needed.
To update the UDF code, you must update mnode with the new content. Use the OR REPLACE option to overwrite the existing function and apply the updated code for future calls.
CREATE [OR REPLACE] FUNCTION function_name AS library_path OUTPUTTYPE output_type [LANGUAGE 'C' | 'Python'];
Logging
Python UDFs in TDengine do not support direct debugging of Python code but do support logging. You can use commonly used libraries like logging
within UDF functions. It is recommended to output logs to a file for review. Note that messages printed using the print
function will not be visible, so avoid using it for output. For example:
Exception Handling
Raise exceptions to terminate queries if needed. Exceptions are logged in taosudfpy.log
.
Checking UDF Framework Logs
If the UDF framework returns an error, check the log files in the TDengine log directory for details:
taospyudf.log
Logs Python UDF activity, including function loading, execution errors, and exceptions. Use this log when debugging Python UDFs.udfdlog.0
Logs framework-level issues for all UDF languages (C, Python, etc.). Check this log for broader framework errors, though it’s rarely needed.
Common Errors and Solutions
Serial Number | Error Code | Error Description | Explanation |
---|---|---|---|
1 | 0x80002901 | udf is stopping | The UDF framework is shutting down and cannot provide external services. |
2 | 0x80002902 | udf pipe read error | A data read error occurred during communication between the taosd and udfd processes. |
3 | 0x80002903 | udf pipe connect error | A connection error occurred during communication between the taosd and udfd processes. |
4 | 0x80002904 | udf no pipe | Failed to create a pipe during communication between the taosd and udfd processes. |
5 | 0x80002905 | udf load failure | Failed to load the UDF module. |
6 | 0x80002906 | udf invalid state | The UDF framework is in an invalid state (e.g., during initialization) and cannot provide external services. |
7 | 0x80002907 | udf invalid function input | Invalid parameters were provided to the UDF function. |
8 | 0x80002908 | udf no function handle | Internal error: no function handle was found. |
9 | 0x80002909 | udf invalid bufsize | The BUFSIZE specified for an aggregate function was invalid or exceeded the limit. |
10 | 0x8000290A | udf invalid output type | The actual data type does not match the OUTPUTTYPE specified when creating the UDF function. |
11 | 0x8000290B | udf program language not supported | The specified programming language is not supported. |
12 | 0x8000290C | udf function execution failure | The UDF function encountered an error during execution. Check logs for more details. |
13 | 0x8000290D | udf raise exception | The UDF function raised an exception or terminated explicitly, causing the query to stop. |
Errors 10 and 12 are the most common issues when developing Python UDFs. For detailed causes, refer to the taospyudf.log
file.
Examples and Resources
TDengine’s open-source repository provides several Python UDF test examples for reference:
- Examples Repository: https://github.com/taosdata/TDengine/tree/3.0/tests/system-test/0-others/udfpy
- CI Test Case: https://github.com/taosdata/TDengine/blob/3.0/tests/system-test/0-others/udfpy_main.py
Steps to Run Test Cases:
- Ensure Python3 is correctly installed.
- Clone the TDengine OSS code.
- Navigate to the
TDengine/tests/system-test/
directory. - Run the test case using:
python3 test.py -f others/udfpy_main.py
Notes:
1. Common Issues with UDF Scalar Functions
- Row Count Mismatch: The number of rows returned by a scalar function must match the number of input rows.
- If the logic is complex, it’s easy to miss rows during processing. Be cautious to avoid such errors.
2. OUTPUTTYPE
Mismatch
- Output Type Consistency: The return type of the UDF must match the
OUTPUTTYPE
specified during its creation.- Mismatched types will result in an error.
- Scalar Functions: The return value of the
process
function must be alist
, and all elements in the list must match the data type specified byOUTPUTTYPE
. - Aggregate Functions: The final return value in the
finish
function must match theOUTPUTTYPE
as a Python data type.
3. Error: Unable to Load libtaospyudf.so
- Check Installation: Ensure the
taospyudf
plugin was installed properly.- The library is typically installed in
/usr/local/lib/libtaospyudf.so
or Python’s plugin directory. - Use the command below to locate the file:
- The library is typically installed in
find / -name 'libtaospyudf.so'
- Missing File: If the file is missing, the plugin installation may have failed. Reinstall the plugin to resolve the issue.
- File Exists but Still Fails:
- Use
ldd
to check the library’s dependencies:
- Use
ldd /usr/local/lib/libtaospyudf.so
- Missing Python dependencies are a common issue. For example, an improperly installed Python environment can cause errors.
- Solution: Reinstall Python (e.g., Python 3.9) correctly to resolve the dependency issue.
Conclusion
The Python UDF feature in TDengine is an exciting step forward, offering unmatched customization for your database. This is just the beginning—we’re continuously improving and expanding its capabilities. We invite you to explore this feature, unleash your creativity, and share your feedback to make it even better!