As every data scientist knows, developing original and sophisticated analyses for massive data sets is only part of the job. Before you can begin your analysis, it’s often necessary to go through a time-consuming process of collecting, retrieving, and cleaning the data. TDengine has a number of features that can simplify this process and let you spend more time on the science.
Getting Started
High-Speed Ingestion
When you’re working with big datasets, speed matters. TDengine is purpose-built to quickly process massive amounts of time-series data – think millions of rows in milliseconds – so the data collection process doesn’t become a bottleneck.
If you’d like to see how TDengine’s performance compares with other time-series database products, read our Performance Comparison.
Custom Data Retrieval
Instead of having to review a massive CSV file, TDengine makes it easy to get the exact information you need into your dataframe. Simple SQL commands allow you to filter data based on columns, values, tags, or time ranges.
Timestamp Alignment
Adjusting multiple time-series datasets to share the same timestamps may be one of the most common data cleanup duties, and it’s not one you need to spend any more time on. TDengine’s built-in time series functions allow for easy interpolation or aggregation to align timestamps across your data.
Data Context
The need to preserve the context of each data point is another unique aspect of time series analysis, and one that’s hard to maintain in a CSV or data lake. A key metric for an electric utility might be kilowatts generated, but an analysis that identifies operational insights would require understanding of each turbine’s location, temperature, vibration and other factors. TDengine is built to maintain this type of metadata, and tools like tags and supertables make it easy to get a full picture of your data.
Pandas and Jupyter Integration
TDengine includes client libraries for popular programming languages, including Python, and support for Jupyter notebooks. A few lines of code are all you need to install the TDengine Python library into your JupyterLab. From there, you can query data from TDengine’s REST API directly into a pandas dataframe.
Generating Insights
Built-In Analytics Functions
No matter which platform you’re working with, complex analysis will likely require custom coding. To simplify the process, TDengine has extended standard SQL with multiple time-series-specific functions, including:
- Cumulative sums
- Derivatives
- Rate of change
- Moving average
- Difference between values
- Spread
- Time-weighted average
User-Defined Functions
You can easily create your own functions in Python or C/C++ to meet the requirements of your particular use cases and use them just like standard SQL functions.
SQL Algorithms
TDengine makes it easy to run your own forecasting or abnormality detection algorithms in UDFs, and invoke them with simple SQL commands. TDengine’s distributed architecture allows your algorithm to run simultaneously on multiple query nodes.
Real-Time Analysis
Stream processing is fully supported, so you can apply built-in functions or custom UDFs to real-time data. Case statements and built-in window functions for states, sessions, or events simplify the development of powerful analyses.