Comprehensive Comparison Between TDengine and MongoDB

Chait Diwadkar

June 6, 2024 / News

In the landscape of database solutions, TDengine and MongoDB stand out as leading options, each with distinct advantages for handling large data sets. TDengine specializes in time-series data, offering optimized performance and scalability for scenarios like IoT and industrial analytics. MongoDB, renowned for its flexible document-oriented approach, excels in applications that require dynamic schema and rapid development, such as content management and e-commerce. This detailed comparison will examine the key features, architectures, and use cases of TDengine and MongoDB, providing a thorough insight into which database might best meet specific operational needs.

About TDengine

TDengine is a next generation data historian purpose-built for Industry 4.0 and Industrial IoT. It enables real-time data ingestion, storage, analysis, and distribution of petabytes per day, generated by billions of sensors and data collectors. With TDengine making big data affordable and accessible, digital transformation has never been easier.

About MongoDB

MongoDB is a popular open-source NoSQL database, launched in 2009. Designed to handle large amounts of unstructured and semi-structured data, MongoDB provides a flexible, schema-less data model, horizontal scalability, and high performance. Its ease of use, JSON-based document storage, and support for multiple programming languages have led to its widespread adoption across various industries and applications.

Comprehensive Comparison

Comparison Item	TDengine	TimescaleDB
Official Website	TDengine	MongoDB
Database Type	Time-series database model with support for the concept of Super Table and subtable	Document model supporting JSON document storage
Technical Documentation	TDengine Technical Documentation	MongoDB Technical Documentation
Open Source	Open Source	Open Source
Cloud Service	TDengine Cloud	MongoDB Atlas
Programming Language	C	C++
Supported Operating Systems	Linux, Windows, MacOS	Linux, Windows, MacOS
Language connectors	· Python · Java · C/C++ · Go · Node.js · Rust · C#	· Python · Java · Node.js · C# · Ruby · PHP · Go · C++
Syntax	Supports standard SQL	MongoDB Query Language
Distributed	Supports distributed architecture	Supports distributed architecture
Open Source License	AGPLv3	SSPL
Use Cases	Industrial big data, IoT platforms, smart manufacturing, energy data management, etc.	Content management systems, mobile applications, real-time analytics, IoT data management, etc

TDengine Database Functions

Efficient Data Writing: Supports SQL writing and schema-less data entry, seamlessly integrated with various third-party tools, allowing data to be entered into TDengine through configuration without any code.
Efficient Querying: Supports standard SQL and provides a range of time-series specific queries and window functions, as well as support for User-Defined Functions (UDF).
Streaming Computation: TDengine not only supports continuous queries but also event-driven streaming computations, eliminating the need for streaming components like Flink or Spark when processing time-series data.
Data Subscription: Applications can subscribe to data from a table or a set of tables, offering the same API as Kafka, with the ability to specify filter conditions.
Caching Functionality: Caches the last record of each table, enabling efficient time-series data processing without the need for Redis.
Visualization: Supports seamless integration with various third-party visualization components such as Grafana, Seeq, and Google Data Studio.
Clustering: Horizontal scalability can be achieved by adding nodes to enhance processing power, high availability is provided through multiple replicas, and TDengine supports deployment via Kubernetes.
Management: Monitors instances running in TDengine, supporting various data import/export methods.
Tools: Provides an interactive command-line interface (CLI) for easy management of clusters, system status checks, and ad-hoc querying; includes the taosBenchmark stress testing tool for testing TDengine’s performance.
Language Connectors: Offers connectors in various languages such as C/C++, Java, Go, Node.js, Rust, Python, and C#, supporting REST interfaces.

MongoDB Database Functions

Flexible Document Model: MongoDB stores data using JSON format documents, supporting complex data structures and nested documents, suitable for various types of data storage needs.
Multiple Query Operations: MongoDB supports various types of query operations, including basic CRUD operations (Create, Read, Update, Delete), complex aggregation queries, and geospatial queries.
High Availability and Fault Tolerance: MongoDB supports replica sets and automatic failover mechanisms, ensuring high availability and continuity of data.
Sharded Clusters: MongoDB supports sharded cluster architecture, enabling horizontal scaling and load balancing of data storage and queries.
Indexes and Aggregation Pipelines: MongoDB supports various types of indexes and aggregation pipeline operations to improve query performance and flexibility.
Geospatial Indexes and Queries: MongoDB provides geospatial indexing and querying functionality, supporting storage and querying operations for geolocation data.
Security and Access Control: MongoDB offers robust security features, including access control, data encryption, and authentication.

TDengine Key Concepts

Metric: Metrics refer to the physical quantities collected by sensors, devices, or other types of collection points. These quantities, such as current, voltage, temperature, pressure, GPS location, etc., change over time. They can be of various data types including integer, float, boolean, or string.
Label: Labels represent the static attributes of sensors, devices, or other types of collection points. Unlike metrics, labels do not change over time. Examples of labels include device model, color, location of the device, etc. Labels can have any data type.
Data Collection Point: Data collection points are hardware or software entities that collect physical quantities according to predefined time periods or triggered events. Each data collection point can collect one or more metrics, all of which are collected at the same moment and share the same timestamp. Complex devices often have multiple data collection points, each with potentially different collection periods, and they operate independently and asynchronously.
Table: Since metrics are generally structured data, and to reduce the learning curve, TDengine manages data using a traditional relational database model. Users need to create a database first, then create tables, and only then can they insert or query data.
Super Table: Due to the large number of tables created when each data collection point has its own table, managing them becomes challenging. Moreover, applications often require aggregation operations between collection points, which complicates aggregation operations. To address this issue, TDengine introduces the concept of Super Tables (STables). An STable is a collection of data collection points of a specific type.
Subtable: When creating a table for a specific data collection point, users can use the definition of the super table as a template and specify the specific label values for that particular collection point to create the table. Tables created using the definition of a super table are called subtables.
Database: A database is a collection of tables. TDengine allows multiple databases in a single running instance, and each database can be configured with different storage policies.

MongoDB Key Concepts

Database: In MongoDB, a database is a container for collections, which are groups of related documents.
Collection: A collection in MongoDB is similar to a table in a relational database and is used to store a group of documents.
Document: A document in MongoDB is a single record stored in BSON (Binary JSON) format. Documents within a collection can have different structures.
Field: A field is a key-value pair within a document, similar to an attribute or column in a relational database.
Index: An index in MongoDB is a data structure used to enhance the performance of queries on specific fields within a collection.

TDengine Underlying Architecture

TDengine can be deployed locally, in the cloud, or as a hybrid solution, providing flexibility in deployment and management.

The architectural design of TDengine primarily includes the following components:

Storage Layer: TDengine’s storage layer is responsible for the actual data storage. It adopts a columnar storage structure to enhance query performance and compress data size. Data is stored on local disks to ensure its durability and reliability.
Compute Layer: The compute layer of TDengine is responsible for executing queries and computational tasks. It includes a query processor and computation engine for parsing query statements, performing computations, and returning results to clients.
Distributed Architecture: TDengine supports a distributed architecture that allows data to be sharded and stored across multiple nodes, enabling horizontal scaling and load balancing. Each node can independently handle query requests and perform computational tasks, thus improving the system’s performance and reliability.
Metadata Management: TDengine uses metadata to manage data storage and distribution. Metadata includes information about databases, tables, partitions, and the distribution of data across nodes. Metadata management enables TDengine to effectively manage and route data.
Client Interfaces: TDengine provides various client interfaces, including SQL interface, HTTP interface, and client libraries. Developers can utilize these interfaces.

MongoDB Underlying Architecture

MongoDB uses a flexible, JSON-like document model for data storage, which allows for dynamic schema changes without downtime. It supports ad-hoc queries, indexing, and real-time aggregation. The architecture primarily includes the following components:

Nodes: A MongoDB cluster consists of multiple nodes, each of which can play different roles, such as primary, secondary, or arbiter nodes.
Replica Sets: MongoDB uses replica sets to provide data redundancy and high availability. A replica set consists of one primary node and multiple secondary nodes. The primary node handles all write operations, while the secondary nodes replicate the data from the primary. If the primary node fails, the replica set automatically elects a new primary to ensure system continuity and availability.
Sharded Clusters: MongoDB uses sharded clusters to achieve horizontal scaling and load balancing. Sharded clusters partition data into multiple shards, each storing a portion of the data and distributed across different nodes. Router nodes are responsible for directing query requests to the appropriate shards and merging the results to return to the client.
Config Servers: Config servers store metadata information about the sharded cluster, including the location and range of each shard. Config servers enable MongoDB to manage and route data effectively.
Drivers: MongoDB drivers are client libraries that interact with the MongoDB database, providing APIs in various languages such as Python, Java, Node.js, etc. Developers use drivers to connect to MongoDB databases and perform various operations, such as inserting documents and querying data.

TDengine Main Features

Due to its effective utilization of the characteristics of time-series data, such as structured data, no need for transactions, infrequent deletions or updates, and more writes than reads, TDengine has the following features compared to other time-series databases:

High Performance: TDengine is the only time-series database that has solved the high cardinality challenge associated with time-series data storage. It supports hundreds of millions of data collection points and excels in data insertion, querying, and compression over other time-series databases.
Simplified Time-Series Data Platform: TDengine includes built-in caching, stream computing, and data subscription features, providing a simplified solution for processing time-series data that significantly reduces the complexity and operational costs of business systems.
Cloud-Native: With native distributed design, data sharding and partitioning, separation of storage and computing, RAFT protocol, Kubernetes deployment, and complete observability, TDengine is a cloud-native time-series database that can be deployed on public, private, and hybrid clouds.
Ease of Use: For system administrators, TDengine significantly reduces the cost of management and maintenance. For developers, TDengine offers simple interfaces, simplified solutions, and seamless integration with third-party tools. For data analysts, TDengine provides convenient data access capabilities.
Analytical Capabilities: With super tables, separation of storage and computation, partitioning and sharding, pre-computation, and other technologies, TDengine efficiently browses, formats, and accesses data.
Core Open Source: The core code of TDengine, including its clustering capabilities, is openly available under open-source licenses. It has over 528.7k running instances worldwide and 22.9k GitHub stars (data as of May 10, 2024), with an active developer community.

MongoDB Main Features

Flexible Data Model: MongoDB’s schema-less data model allows for the storage and querying of various data types, enabling it to handle complex and evolving data structures.
High Availability: MongoDB’s replica set feature ensures high availability through automatic failover and data redundancy.
Horizontal Scaling: MongoDB’s sharded cluster architecture facilitates horizontal scaling and load balancing, making it capable of handling large-scale data processing and querying.

TDengine Application Scenarios

Internet of Things (IoT): As the volume of data in the IoT sector continues to grow, traditional big data solutions and relational database-centered solutions are becoming increasingly insufficient. Addressing the real-time data storage, querying, and analysis under large data volumes has become an urgent issue to resolve with a time-series database tailored for IoT platforms.
Industrial Internet: In the field of industrial big data, production, testing, and operational stages may generate massive amounts of time-stamped sensor data, which are inherently time-series data. This data is primarily collected or generated by various types of real-time monitoring, inspection, and analysis equipment, involving sectors like industrial manufacturing, power, chemicals, engineering operations, and smart manufacturing. These applications typically feature heavy write operations, minimal read operations, and handle very large data volumes.
Connected Vehicles: Through the analysis of vehicle telemetry data, services such as real-time vehicle network quality monitoring, component health monitoring, driver behavior monitoring, vehicle system security analysis, and compliance monitoring can be implemented. As the number of vehicles and onboard sensors increases, choosing the right time-series database can prevent data storage bottlenecks on vehicle messaging platforms.
Power and Energy: With the development of the power IoT, the data generated in various stages of generation, transmission, transformation, distribution, and consumption is increasing dramatically, posing serious challenges to traditional relational database-centered solutions. How to manage the storage, querying, and analysis of power and energy data under large data volumes and how to choose a suitable power time-series database has become a critical issue.
IT Operations and Maintenance: As servers, IoT devices, and various new types of sensors gradually increase, traditional operation and maintenance methods are becoming more challenging, severely limiting business development. Thus, there is a widespread concern and overwhelming burden on the information service departments across all industries regarding the operation and maintenance of hardware systems. There is an urgent need for a platform based on massive time-series data to support complex operational tasks.
Finance: The storage and computation of financial market data, which often involves numerous subtables, large volumes of real-time data, fixed data formats, and long retention periods, require a suitable time-series database. TDengine offers investment research services based on market data centers, including asset management, real-time monitoring, performance analysis, risk analysis, sentiment analysis, stock backtesting, signal simulation, and report generation.

MongoDB Application Scenarios

Content Management Systems (CMS): MongoDB’s flexible data model makes it an ideal choice for content management systems that often need to store and manage various types of content such as articles, images, and videos. MongoDB’s schema-less nature allows it to easily adapt to changing content structures and requirements.
IoT Platform Data Storage and Analysis: MongoDB’s support for high data volumes and horizontal scalability makes it suitable for storing and processing data generated by IoT devices, such as sensor readings and device logs. Its efficient indexing and querying capabilities enable real-time data analysis and monitoring of IoT devices.
E-commerce Platforms: The flexibility and performance features of MongoDB make it a preferred choice for e-commerce platforms that require efficient storage and querying of diverse product information, customer data, and transaction records. MongoDB’s flexible data model easily adapts to changes in product attributes and customer preferences, while its high availability and scalability ensure a smooth and responsive user experience.

TDengine Pricing Model

TDengine OSS is licensed under AGPLv3 and is open-source software that can be used for free under the terms of the AGPL license. In addition, TDengine also offers two commercial versions, TDengine Enterprise and TDengine Cloud, which provide additional features and professional technical support on top of TDengine OSS.

The pricing of the TDengine Enterprise is closely related to usage scenarios and can be either an annual subscription or a perpetual license. The billing for TDengine Cloud is based on the computing resources required by the actual data volume, with charges applied monthly. It offers tiers including Starter, Basic, Standard, Professional, and Flagship editions. A pricing calculator is available to estimate costs.

MongoDB Pricing Model

MongoDB offers a variety of pricing options, including a free open-source Community Edition and a commercial Enterprise Edition. The Enterprise Edition includes advanced features, management tools, and technical support.

MongoDB also offers a fully managed cloud-based database service, MongoDB Atlas, which operates on a pay-as-you-go pricing model. Charges are based on storage, data transfer, and computing resources. MongoDB Atlas provides a free tier that allows users to try the service without incurring costs, although resources are limited.

Chait Diwadkar

Chait Diwadkar previously worked as Director of Solutions Engineering at TDengine.