Mastering Memory Leak Detection in TDengine

Shengliang Guan
Shengliang Guan
/
Share on LinkedIn

Memory leaks are a common issue that can cause a program’s memory usage to gradually increase, eventually leading to the exhaustion of system resources or program crashes. Tools like AddressSanitizer (ASan) and Valgrind are excellent for memory detection, and TDengine’s CI process uses ASan. However, this memory leak issue occurred on Windows, which our CI currently doesn’t cover. Thus, the TDengine development team chose Windbg to tackle the problem. The results show that Windbg is also a good choice for dealing with memory leaks on Windows.

Common Methods for Detecting Memory Leaks

Memory leaks typically occur in the following scenarios:

  1. The program does not properly release allocated memory.
  2. Circular references exist, preventing the garbage collector from reclaiming memory.
  3. Third-party libraries or components with memory leaks.

The main methods for detecting memory leaks include:

  • Static Code Analysis Tools: These can detect unfreed pointers or allocation errors but can’t detect dynamic memory allocation issues at runtime.
  • Dynamic Analysis Tools: Tools like Valgrind track memory allocation and deallocation during runtime but might affect program performance.
  • Debuggers: Tools such as WinDbg and GDB.

Advantages and Disadvantages

  • Static Code Analysis: Effective for early detection but not for runtime dynamic allocation issues.
  • Dynamic Analysis Tools: Effective during runtime but might impact performance and require significant resources for large applications. However, in resource-rich test environments, these issues are mitigated; ASan has helped us identify numerous issues.
  • Debuggers: Detect issues during runtime and offer powerful analysis tools.

Practical Analysis

Basic Principle

Using Windbg to locate memory leaks relies on the gflags component to record all memory allocations and deallocations during program execution, along with the call stack information for these operations. By taking two snapshots with the umdh component during program execution and comparing them, we can identify memory allocations that were not freed. If there is a memory leak, the call stack information for the leak point will usually be at the top of the diff result. The key is to trigger the memory leak as much as possible between the two snapshots for accurate location. The diff results will also include some normal allocations that weren’t released yet, but the frequency of calls makes it easy to identify leaks.

Problem Introduction

Taosdump encountered an error on Windows while importing data:

build and install latest TDengine 3.0 branch on Windows
use "taosBenchmark -I stmt -y" to create a lot of tables and data (10000 * 10000).
use "taosdump -D test -o outputFile" to dump out
use "taos -s 'drop database test'" to drop database
use "taosdump -i inputFile" to dump in.

Error log: taosd “tsem_init failed, errno: 28”

Taosdump: dumpInAvroDataImpl() LN7039 taos_stmt_execute() failed! reason: Out of Memory, timestamp: 1500000009256

Troubleshooting Process

Configuring gflags

The gflags tool should be located at: C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\gflags. If it’s not available, download it from Microsoft’s official website: Debugger Download Tools.

After installation, run gflags.exe /i your_application.exe from the command line to set the tracking target and related parameters. Alternatively, double-click to run it, set the Image File to your application, press the Tab key, and select other configurations.

Steps to Locate the Leak

  1. Start your_application.exe (I need to debug taosdump.exe, so the following steps are for taosdump.exe).
    "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\gflags" -i taosdump.exe +ust
  1. Copy the pdb file to the mysymbols directory. The pdb file contains the debug information of the compiled program and is generated along with the executable file.
  2. Set pdb directory:
set _NT_SYMBOL_PATH=c:\mysymbols;srv*c:\mycache*https://msdl.microsoft.com/download/symbols

4. Generate the first memory snapshot:

      "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\umdh" -pn:taosdump.exe -f:C:\xstest\umdhlog\taosdump11.log

      5. Generate the second memory snapshot:

        "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\umdh" -pn:taosdump.exe -f:C:\xstest\umdhlog\taosdump12.log

        6. Generate the snapshot comparison result (umdh):

          "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\umdh"  C:\xstest\umdhlog\taosdump11.log C:\xstest\umdhlog\taosdump12.log -f:C:\xstest\umdhlog\taosdumpdiff11_12.log

          Analysis and Solution

          Result File

          Since the taosdump program does a lot of work from start to finish, memory leaks easily occur between the two snapshots. 988040 – 6ecf0 indicates “allocation count – release count,” showing a clear memory leak at the buildRequest function’s sem_init.

          +  919350 ( 988040 - 6ecf0)  201b0 allocs        BackTrace9CB6973F
          +   1ea5c ( 201b0 -  1754)        BackTrace9CB6973F        allocations
          
                  ntdll!RtlpAllocateHeapInternal+948D5
                  taos!heap_alloc_dbg_internal+1F6 (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 359)
                  taos!heap_alloc_dbg+4D (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 450)
                  taos!_calloc_dbg+6C (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 518)
                  taos!calloc+2E (minkernel\crts\ucrt\src\appcrt\heap\calloc.cpp, 30)
                  taos!sem_init+5D (C:\workroom\TDengine\contrib\pthread\sem_init.c, 109)
                  taos!buildRequest+209 (C:\workroom\TDengine\source\client\src\clientImpl.c, 192)
                  taos!stmtCreateRequest+73 (C:\workroom\TDengine\source\client\src\clientStmt.c, 15)
                  taos!stmtSetTbName+115 (C:\workroom\TDengine\source\client\src\clientStmt.c, 588)
                  taos!taos_stmt_set_tbname+7F (C:\workroom\TDengine\source\client\src\clientMain.c, 1350)
                  taosdump!dumpInAvroDataImpl+E25 (C:\workroom\TDengine\tools\taos-tools\src\taosdump.c, 6260)
                  taosdump!dumpInOneAvroFile+3D2 (C:\workroom\TDengine\tools\taos-tools\src\taosdump.c, 7229)
                  taosdump!dumpInAvroWorkThreadFp+20B (C:\workroom\TDengine\tools\taos-tools\src\taosdump.c, 7306)
                  taosdump!ptw32_threadStart+CD (C:\workroom\TDengine\contrib\pthread\ptw32_threadStart.c, 233)
                  taosdump!thread_start<unsigned int (__cdecl*)(void *),1>+9C (minkernel\crts\ucrt\src\appcrt\startup\thread.cpp, 97)
                  KERNEL32!BaseThreadInitThunk+10
                  ntdll!RtlUserThreadStart+2B

          Fixing the Leak

          Next, examine and modify the code. In C language, memory management is flexible but can be cumbersome. It’s evident that some paths missed calling tsem_destroy.

          Conclusion

          To do a good job, one must first sharpen their tools. Mastering more tools and methods enables you to handle issues more confidently. Using Windbg to locate memory leaks is very simple yet effective. However, it relies on pdb files, so remember to keep pdb files when releasing applications. These files contain the program’s symbol information, helping pinpoint issues accurately during debugging.

          Additionally, the problematic code shows that the memory management method is prone to errors. The RAII (Resource Acquisition Is Initialization) mechanism can effectively prevent resource leaks. Though C language doesn’t offer RAII as smoothly as C++, it can be simulated for similar effects, and considering optimization in the future could be beneficial.

          RAII is a critical resource management technique that associates resource acquisition with object lifecycle. By acquiring resources in the constructor and releasing them in the destructor, it ensures proper resource management, preventing leaks. This mechanism is widely used in C++ and other languages, proving to be an effective resource management strategy.

          For more detailed code solutions, see TDengine PR #19580.

          • Shengliang Guan

            Shengliang Guan is Co-Founder and Vice President of Solution Engineering at TDengine and led the development of all iterations of TDengine 1.0, 2.0, and 3.0. He has been focusing on the field of time-series data storage, giving several keynote speeches on the topic and actively participating in open-source community activities.