As every data scientist knows, developing original and sophisticated analyses for massive data sets is only part of the job. Before you can begin your analysis, it’s often necessary to go through a time-consuming process of collecting, retrieving, and cleaning the data. TDengine has a number of features that can simplify this process and let you spend more time on the science.
When you’re working with big datasets, speed matters. TDengine is purpose-built to quickly process massive amounts of time-series data – think millions of rows in milliseconds – so the data collection process doesn’t become a bottleneck.
If you’d like to see how TDengine’s performance compares with other time-series database products, read our Performance Comparison.
Custom Data Retrieval
Instead of having to review a massive CSV file, TDengine makes it easy to get the exact information you need into your dataframe. Simple SQL commands allow you to filter data based on columns, values, tags, or time ranges.
Adjusting multiple time-series datasets to share the same timestamps may be one of the most common data cleanup duties, and it’s not one you need to spend any more time on. TDengine’s built-in time series functions allow for easy interpolation or aggregation to align timestamps across your data.
The need to preserve the context of each data point is another unique aspect of time series analysis, and one that’s hard to maintain in a CSV or data lake. A key metric for an electric utility might be kilowatts generated, but an analysis that identifies operational insights would require understanding of each turbine’s location, temperature, vibration and other factors. TDengine is built to maintain this type of metadata, and tools like tags and supertables make it easy to get a full picture of your data.
Pandas and Jupyter Integration
TDengine includes client libraries for popular programming languages, including Python, and support for Jupyter notebooks. A few lines of code are all you need to install the TDengine Python library into your JupyterLab. From there, you can query data from TDengine’s REST API directly into a pandas dataframe.
Built-In Analytics Functions
No matter which platform you’re working with, complex analysis will likely require custom coding. To simplify the process, TDengine has extended standard SQL with multiple time-series-specific functions, including:
- Cumulative sums
- Rate of change
- Moving average
- Difference between values
- Time-weighted average
You can easily create your own functions in Python or C/C++ to meet the requirements of your particular use cases and use them just like standard SQL functions.
TDengine makes it easy to run your own forecasting or abnormality detection algorithms in UDFs, and invoke them with simple SQL commands. TDengine’s distributed architecture allows your algorithm to run simultaneously on multiple query nodes.
Stream processing is fully supported, so you can apply built-in functions or custom UDFs to real-time data. Case statements and built-in window functions for states, sessions, or events simplify the development of powerful analyses.
Experience TDengine Today
If you’re interested in trying out the features of TDengine for yourself, visit cloud.tdengine.com to get started with our cloud service. You can register for a free trial with no credit card required.
Alternatively, you can download the free, open-source TDengine Community Edition for Linux, macOS, or Windows. This edition, licensed under the AGPLv3, includes all core functionality that you need for time-series analysis and operations.
Enterprise customers are encouraged to contact us for more information about how TDengine can improve your data infrastructure.