Simplifying Time Series Analysis for Data Scientists

Jeff Tao
Jeff Tao
/
Share on LinkedIn

As every data scientist knows, developing original and sophisticated analyses for massive data sets is only part of the job. Before you can begin your analysis, it’s often necessary to go through a time-consuming process of collecting, retrieving, and cleaning the data. TDengine has a number of features that can simplify this process and let you spend more time on the science.

Getting Started

High-Speed Ingestion

When you’re working with big datasets, speed matters. TDengine is purpose-built to quickly process massive amounts of time-series data – think millions of rows in milliseconds – so the data collection process doesn’t become a bottleneck.

If you’d like to see how TDengine’s performance compares with other time-series database products, read our Performance Comparison.

Custom Data Retrieval

Instead of having to review a massive CSV file, TDengine makes it easy to get the exact information you need into your dataframe. Simple SQL commands allow you to filter data based on columns, values, tags, or time ranges.

Timestamp Alignment

Adjusting multiple time-series datasets to share the same timestamps may be one of the most common data cleanup duties, and it’s not one you need to spend any more time on. TDengine’s built-in time series functions allow for easy interpolation or aggregation to align timestamps across your data.

Data Context

The need to preserve the context of each data point is another unique aspect of time series analysis, and one that’s hard to maintain in a CSV or data lake. A key metric for an electric utility might be kilowatts generated, but an analysis that identifies operational insights would require understanding of each turbine’s location, temperature, vibration and other factors. TDengine is built to maintain this type of metadata, and tools like tags and supertables make it easy to get a full picture of your data.

Pandas and Jupyter Integration

TDengine includes client libraries for popular programming languages, including Python, and support for Jupyter notebooks. A few lines of code are all you need to install the TDengine Python library into your JupyterLab. From there, you can query data from TDengine’s REST API directly into a pandas dataframe.

Generating Insights

Built-In Analytics Functions

No matter which platform you’re working with, complex analysis will likely require custom coding. To simplify the process, TDengine has extended standard SQL with multiple time-series-specific functions, including:

  • Cumulative sums
  • Derivatives
  • Rate of change
  • Moving average
  • Difference between values
  • Spread
  • Time-weighted average

User-Defined Functions

You can easily create your own functions in Python or C/C++ to meet the requirements of your particular use cases and use them just like standard SQL functions.

SQL Algorithms

TDengine makes it easy to run your own forecasting or abnormality detection algorithms in UDFs, and invoke them with simple SQL commands. TDengine’s distributed architecture allows your algorithm to run simultaneously on multiple query nodes.

Real-Time Analysis

Stream processing is fully supported, so you can apply built-in functions or custom UDFs to real-time data. Case statements and built-in window functions for states, sessions, or events simplify the development of powerful analyses.

  • Jeff Tao

    With over three decades of hands-on experience in software development, Jeff has had the privilege of spearheading numerous ventures and initiatives in the tech realm. His passion for open source, technology, and innovation has been the driving force behind his journey.

    As one of the core developers of TDengine, he is deeply committed to pushing the boundaries of time series data platforms. His mission is crystal clear: to architect a high performance, scalable solution in this space and make it accessible, valuable and affordable for everyone, from individual developers and startups to industry giants.