Reducing Energy Costs by 90%: The Future of Battery Manufacturing with TDengine

Sun Yu
Sun Yu
/
Share on LinkedIn

Content Summary

  1. The battery cell capacity prediction system primarily uses machine learning to provide services to battery manufacturers.
  2. The system replaces traditional databases and vector databases with TDengine and self-developed algorithms.
  3. With the help of TDengine, this battery capacity prediction system helps users save over 10 million dollar in hardware, space, labor, and other costs, as well as significant time costs.

In the production process of lithium-ion batteries, the capacity data of the batteries is an important indicator to evaluate the quality of individual cells before they leave the factory. Manufacturers classify the cells into different capacity grades, and cells with the same grade are grouped together to form battery packs for better stability and longer lifespan.

Currently, the common method of obtaining lithium-ion battery capacity data involves using charging and discharging equipment. The battery is fully charged, and then a multi-step, stepwise current discharge is carried out. The battery is completely discharged, and the total capacity for each discharge step is accumulated as the factory capacity. This process is known as grading.

The downside of this method is that the entire charging and discharging process is extremely time-consuming and energy-consuming. For example, with a 18650 battery with a standard capacity of 3200 mAh, the entire process lasts 4–5 hours, and each charge and discharge stage consumes a lot of electricity. Additionally, before grading, there is a necessary formation process. After the electrolyte is injected and sealed, the physical structure of the battery is essentially assembled. To fully activate the electrodes and ensure they are fully wetted by the electrolyte, the battery undergoes a process of multiple constant-voltage charging and discharging cycles, which activates the battery’s discharge capabilities. This electrochemical process is known as the formation process. After formation, the battery usually undergoes at least a day of normal temperature storage or high-temperature aging to stabilize its chemical properties before grading to check capacity. This process incurs significant costs:

  • Both storage and grading consume a lot of time.
  • The multiple charging and discharging steps during grading waste a lot of energy.
  • Grading equipment is massive, occupying a large area and making stacking machines and logistics more challenging.

However, by the time the formation process is completed, the battery’s physical and chemical properties are essentially determined, and these properties are reflected in the charge and discharge curves during the formation process, which correlate highly with the data recorded during earlier production stages.

Thus, the battery cell capacity prediction system, kun, was born. With the help of kun, the vast majority (70%-90%) of battery capacities can be predicted directly through the system. Its advantages include:

  • The removal of grading equipment, significantly reducing land, capital, and other costs for factory construction.
  • Eliminating the need for grading charging and discharging steps, saving a lot of energy.
  • Saving time on storage and grading, which in turn greatly increases production capacity.
  • Simplifying the entire production line system, reducing the probability of production stoppages due to grading equipment malfunctions and the costs associated with equipment maintenance.
Before Deploying TDengineAfter Deploying TDengine
Massive Grading EquipmentOnly 10% of Grading Equipment
Huge Energy Consumption in Grading ProcessOnly 10% of Energy Consumption
Fixed Full Capacity, Slow OutputKun Expands Server Clusters to Achieve Linear Performance Scaling, Faster Production Output
High Equipment CostsOnly 10% of Equipment Budget
Large Workshop Area, High Floors, Requires Many Stacking MachinesOnly 10% of Space Occupied

Economic Benefits

Original DesignAfter Deploying Capacity Prediction
Grading Slots300 Grading Slots
$7.44 million
30 Grading Slots
$744,000
Energy Consumption7,900,000 kWh/year
$1.09 million/year
79,000 kWh/year
$10,900/year
Total SavingsTotal Equipment Investment Reduced by $6.7 Million
Annual Energy Cost Reduced by $1.08 Million

This project launched in November 2022 and has been in normal production for nearly two years, with an average monthly shipment rate of 85% of capacity. The initial debugging and process optimization period lasted two months.

Through the joint efforts of our team and the TDengine team over two months, we established process and production consistency standards to ensure that the production environment remained stable, improving the accuracy of predictions while reducing the NG rate. The capacity prediction system now completes the capacity predictions for a large number of cells every day, with an average error rate of 0.37% and an NG rate prediction accuracy of 5.7%.

Currently, the capacity prediction system fully meets customer requirements. In the first year, it saved the customer more than 7.7 million dollar in economic value and passed global acceptance. In the second year, the savings will exceed 8.26 million dollar.

Background for Choosing TDengine

TDengine has played a significant role in our project. By the end of 2021, our system (kun) was nearly complete, and we were using MariaDB for data persistence. However, when I performed delivery testing with a large amount of simulation data, I encountered a performance bottleneck: MariaDB’s single-table storage model resulted in slow queries when the data volume reached a certain scale.

While the main function of our system is to provide services using machine learning, the production data of the battery cells (production time, barcode, equipment ID, channel number, etc.) must be recorded. These “logs” are crucial because our client (the battery factory) needs to ensure quality for customers (such as electric car manufacturers). If the predicted capacity of some cells deviates from the normal distribution, we must trace back to the data to quickly locate the issue on the production line and correct it in time. For example, if the temperature sensor of a formation device at a certain point fails, the measured temperature may be inaccurate. This situation involves several questions:

  1. Which device’s fixture temperature deviated from the normal distribution?
  2. Which charging and discharging point on that device was affected?
  3. When did this issue start?

To ensure production safety, factories always want to locate and solve problems as quickly as possible. The typical approach is to quickly retrieve and analyze data from the suspected time period.

However, when using MariaDB, as the data volume grows, query performance deteriorates sharply, especially for time-based fields that are hard to index. To continue using MariaDB as the data persistence solution, we would need to build complex sharding and partitioning logic at the application level to facilitate quick data retrieval.

Furthermore, we wanted a database that could perform real-time calculations on newly written data, so we could report errors immediately, which is far more efficient and safer than discovering issues manually and then tracing back through the data.

With these requirements in mind, I reconsidered the database selection. While searching for suitable solutions, I stumbled upon TDengine. After reviewing its features, we were convinced that TDengine was a perfect match for kun.

TDengine FeaturesOur RequirementsExample
Data Model: One table per data collection pointOne process corresponds to one batch, and each batch corresponds to one tableA batch is equivalent to one collection point
Aggregation operations between collection points can be performed via SupertablesCross-batch statisticsIn the previous example, if multiple batches use the faulty equipment simultaneously, cross-batch statistics of the fixture temperature distribution would be required
Stream processingTimely detection of production anomalies and early warningsThis can be achieved by applying window aggregation in stream processing to retrieve field anomalies in real-time.

We found that TDengine’s features perfectly aligned with our needs, and I believe it solves many pain points in industrial scenarios. Many developers who explore TDengine in this field will likely feel that they have found a long-lost solution.

Operating Environment

As a time-series database product, TDengine is highly friendly to hardware requirements. Thanks to its excellent data compression ratio, we were able to significantly reduce hardware costs in the production environment, achieving “cost reduction and efficiency improvement” by slashing hardware expenses by more than half.

Three rack-mounted servers

ConfigurationParameters
SystemUbuntu 24.04
ProcessorIntel® Xeon® Silver 4314
Storage2T NVME SSD
Memory4 x 16GB

TDengine Version: 3.2.3.0, three-replica mode

Self-developed algorithms + TDengine replace vector databases

To ensure the accuracy of predicted capacity, we implemented continuous machine learning (CML). This mechanism allows us to automatically keep the model’s behavior aligned with the latest production conditions.

As mentioned earlier, to reduce equipment and space costs, manufacturers typically only purchase 10%-30% of the required grading equipment. This results in a very limited amount of full-process data available for machine learning during the early stages, which can prevent proper machine learning training.

However, with our self-developed curve search algorithm, we successfully solved this issue, ensuring the client’s shipment efficiency. Specifically, filtered full-process data is stored in the database, and when a new cell’s capacity needs to be predicted, the algorithm retrieves similar curves to make the prediction.

Engineers familiar with vector databases will recognize that this is a typical feature of vector databases, which natively support clustering-based search algorithms like k-NN (k-nearest neighbor), HNSW (Hierarchical Navigable Small World), or IVF (Inverted File Index).

But integrating a vector database into our system would have increased maintenance costs and introduced unnecessary complexity. Fortunately, TDengine’s built-in aggregation functions and streaming computation features, along with its flexible user-defined functions (UDFs), allow us to efficiently pre-process the data at the time of writing, making it possible to quickly search for similar curves.

One drawback of using vector databases is that the complex metadata of battery cells cannot be directly linked to the feature values used for computation. During the selection research phase, we noticed that in addition to the previously mentioned excellent features, TDengine also offers standard SQL syntax and relational query capabilities, allowing developers to easily record and utilize field relationships with minimal learning effort. This makes development simpler. TDengine’s Rust connector also provides parameter binding, a more efficient method for writing data. When the client sends data, we can quickly process and write it. Later, if users need to combine calculation results with metadata for analysis (as mentioned in the production anomaly alert section), a single query can retrieve all the necessary information. This data modeling approach significantly improves runtime efficiency.

It’s clear that TDengine provides substantial convenience for developers across the entire data pipeline.

TDengine Rust Connector Adds Extra Value

For those familiar with deep learning, it is important to feed the entire dataset in batches to the model during each epoch to maximize the utilization of model parameters. While pre-processed features are stored in TDengine, as the data scale increases, fetching the entire dataset from the database for training can lead to performance problems or even OOM (Out Of Memory) errors.

kun is fully developed in Rust, and we noticed early on that TDengine also has a Rust connector, which has two highlights:

  1. It supports both synchronous and asynchronous contexts.
  2. Through FFI (Foreign Function Interface), it seamlessly integrates Rust with C language primitive data types.

Since training models is CPU-intensive, we run it within a thread pool powered by rayon. If the model requires data interactions with the database, the system can use a message-passing mechanism to write data in real-time, making it highly efficient.

Efficient Data Visualization

The “gold standard” for cell grading is the capacity value. Therefore, production supervisors are highly interested in the distribution of predicted capacity values. With increasing production, a single batch may produce a large number of cells over time, and fetching all this data from the database and aggregating it at the application level can lead to significant delays, damaging the user experience.

Fortunately, TDengine’s window-based queries and aggregation functions allow us to easily generate histograms directly at the database level, returning the results efficiently:

let capacity_histogram_data: Vec<CapacityIntervalHistogram> = taos
    .query(format!(
        "SELECT histogram(capacity,'linear_bin','{}',0) as capacity_rang
    FROM inference.`{}`  WHERE ts > (NOW - {}d)   ;",
        serde_json::json!({
            "start":    min_capacity as i32,
            "width":    (max_capacity as i32 / total_capacity_intervals),
            "count":    total_capacity_intervals,
            "infinity": false
        }),
        table_name,
        ts_n_days
    ))
    .await?
    .deserialize()
    .try_collect()
    .await?;

The frontend can then directly use the returned data to visualize the results.

Production Abnormality Alerts

Modern battery factories typically use fully automated production lines, but due to equipment and software limitations, these lines often lack real-time feedback capabilities. To ensure shipment speed and quality, production supervisors often need to collect data and generate reports periodically (e.g., every 4–6 hours) to detect abnormal data and locate problems in specific processes or equipment.

This method has two main drawbacks:

  1. It takes too long to detect production anomalies, delaying timely corrections.
  2. It is repetitive and labor-intensive.

A fully automated solution for error detection and alerting is certainly possible. By deploying a scheduled task, we can query key data (e.g., cabinet voltage, fixture temperature) at regular intervals and analyze it. With TDengine’s streaming computation feature, we can perform real-time analysis on these data values as they are written, making anomaly detection highly efficient and timely.

This solution will effectively distribute the computational load throughout the program’s lifecycle, making it a perfect example of the “shaving the peaks, filling the valleys” approach.

This is a summary of our experience with TDengine, and we hope this information is helpful.