Choosing a TSDB for Connected Vehicle Data

Juno Qiu

June 23, 2026 / Time-Series Database Essentials-1

Evaluate TSDB for connected vehicle data, including high-cardinality telemetry, CAN bus data, geospatial queries, driving behavior analysis, edge-cloud architecture, and MySQL/MongoDB comparisons.

As intelligent connected vehicle platforms develop, the sensor data generated by a single vehicle has grown from hundreds to tens of thousands of data points per second. Traditional storage architectures face serious challenges under the load of massive concurrent vehicle connections. Time-series databases, with efficient writes, optimized compression, and powerful query capabilities, are becoming an important data infrastructure choice for connected-vehicle platforms.

1. Data characteristics of connected vehicles

Connected-vehicle data imposes exacting demands on storage systems. GPS data reports at 1 to 10 Hz, and a single operational vehicle generates millions of location records daily. CAN bus data covers hundreds of signals (engine RPM, throttle position, brake force, steering angle) with sampling intervals from 10 milliseconds to 1 second. Environmental sensors continuously produce multi-dimensional time series that scale linearly with fleet size.

A mid-sized platform typically connects 100,000 to 1,000,000 vehicles. Peak write loads reach tens of millions of data points per second. The write pattern is strictly time-ordered with very few updates to existing records, dominated by batch inserts, and queries concentrate on the most recent time windows. Relational databases hit bottlenecks quickly under this pattern. Time-series databases using LSM-Tree storage, time-based partitioning, and specialized compression can substantially reduce storage costs while increasing write throughput, depending on workload and configuration.

2. Core selection metrics

2.1 High-concurrency writes

The write-heavy, read-light pattern of connected-vehicle data makes write performance the primary concern. A capable time-series database should support millions of data points per second in writes with horizontal scaling to accommodate fleet growth. Evaluation should consider batch write throughput, network optimization, client-side buffering, and the impact of replica synchronization on write latency.

2.2 High-cardinality handling

With millions of vehicles each producing hundreds of metrics, the number of distinct time series can reach hundreds of millions. Many time-series databases suffer from memory bloat and query degradation at high cardinality. TDengine’s Supertable and Subtable model uses tag-based filtering for efficient grouped aggregation, maintaining stable query performance even in high-cardinality scenarios.

2.3 Geospatial query support

Trajectory replay, geofencing, heatmaps, and vehicle dispatch all require spatial query capabilities. Evaluators should check for native GeoJSON support, spatial functions such as ST_Within and ST_Distance, and spatial index efficiency. Some time-series databases combine temporal and spatial range queries into composite time-plus-space retrieval, which is valuable for fleet operations.

3. Data model design

A two-layer Supertable-plus-Subtable modeling approach is recommended for connected-vehicle data.

A Supertable defines the common schema for one vehicle data type, such as vehicle_gps with columns for timestamp, longitude, latitude, altitude, speed, and heading. Tags store static vehicle attributes such as model, fleet, region, and owner.

Each Subtable corresponds to one physical vehicle, named by plate number or device ID. Subtables share the Supertable schema but are physically isolated, so writes to one vehicle never contend with writes to another. Tags enable fast filtering across vehicle groups and parallel grouped aggregation.

Tag design must balance query needs with index efficiency. High-frequency filter conditions such as city, vehicle type, and online status make good tags. Fast-changing or highly granular attributes should remain as ordinary data columns or be managed in external dimension tables. For example, driving mode (eco, sport, standard) changes frequently and is better stored as a regular field, while brand and model are stable and work well as tags.

4. Real-time analytics

4.1 Driving behavior analysis

Driving behavior analysis relies on real-time computation of acceleration, angular velocity, and steering angle. Sliding-window aggregation identifies harsh acceleration, harsh braking, and sharp turning events. Continuous query or stream processing capabilities transform raw sensor signals into behavioral events, greatly reducing the latency of post-hoc batch processing.

4.2 Vehicle health monitoring

Vehicle health monitoring requires comparing each vehicle’s real-time parameters against dynamic thresholds. Battery voltage, motor temperature, and coolant temperature are tracked per vehicle. If one vehicle’s motor temperature persistently exceeds the statistical average for its model, the system should flag it automatically. This demands tag-grouped aggregation combined with real-time comparison against historical baselines.

4.3 Fault early warning

Fault warning scenarios have the strictest query latency requirements. When a fault code appears, the platform must complete data ingestion, rule matching, and alert dispatch within seconds. Integration with message queues and rule engines enables sub-second anomaly detection and notification, shifting maintenance from reactive repair to proactive prevention.

5. Edge-cloud collaboration

Data processing for connected vehicles is evolving from centralized cloud architectures to a three-tier edge-cloud collaborative model.

Vehicle tier. Onboard computing units filter, sample, and compress raw sensor data. Lightweight or embedded time-series database instances can run on the head unit or T-Box to support local real-time queries for driving assistance functions.

Roadside tier. Deployed at 5G base stations or edge data centers, this tier serves regional vehicle clusters. It performs area-level aggregation such as junction congestion detection and collision avoidance. The database must support edge deployment with limited compute and storage resources while maintaining synchronization with the cloud tier.

Cloud tier. This tier handles global storage and deep analytics. Historical data from the entire fleet converges here for cross-fleet statistics, machine learning model training, and reporting. Elastic scaling is required to add nodes during peak traffic hours and release resources during off-peak periods.

Data synchronization across tiers is a central challenge. Edge-collected data must reliably reach the cloud, while cloud-side configurations and trained models must propagate downward. The time-series database should provide data subscription, incremental synchronization, and conflict resolution mechanisms to maintain data consistency across the multi-tier architecture.

6. Comparison with general-purpose databases

Many early-stage platforms start with MySQL or MongoDB and hit scaling bottlenecks as their fleet grows.

MySQL’s B+Tree index generates heavy random I/O under sequential time-series write patterns, and write throughput collapses as data volumes grow. Query response degrades from milliseconds to seconds once tables surpass ten million rows. Sharding adds complexity and forces application-level rewriting. MySQL also natively lacks downsampling, interpolation, and time-window aggregation functions needed for time-series workloads.

MongoDB writes faster than MySQL but still falls short for time-series scenarios: it has high storage overhead for time-series data, poor compression for high-frequency floating-point values, heavy resource consumption in aggregation pipelines, and no dedicated time-range optimization. At tens of millions of distinct time series, MongoDB faces severe challenges in both index memory consumption and query performance.

Dedicated time-series databases are optimized across the board: columnar storage improves compression ratios, time-based partitioning simplifies lifecycle management, pre-aggregation reduces repeated computation, and specialized query engines accelerate range scans. For data-heavy, write-intensive, time-centric scenarios like connected vehicles, choosing a dedicated time-series database is the prudent choice for ensuring long-term system scalability.

7. Conclusion

The connected-vehicle industry presents demanding challenges for data infrastructure. Time-series databases, with their strengths in high-concurrency writes, high-cardinality handling, and time-dimension query optimization, are often a strong storage choice for this domain. Proper database selection and deployment can significantly improve platform throughput and responsiveness, providing a solid data foundation for intelligent driving, fleet management, and traffic optimization applications.

Teams planning or upgrading their connected-vehicle data platform should evaluate options based on actual data scale, query patterns, and business scenarios. Open-source time-series databases like TDengine provide extensive connected-vehicle case studies and complete documentation as reference points for building efficient, reliable data storage engines for connected-vehicle platforms.