enh(client): refactor logic of slow log #34205

wangmm0220 · 2026-01-07T11:48:17Z

Description

Issue(s)

Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

gemini-code-assist · 2026-01-07T11:48:39Z

Summary of Changes

Hello @wangmm0220, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the client-side slow log monitoring and reporting mechanism. The primary goal is to enhance the reliability of slow log collection and transmission, particularly during client shutdown, by ensuring all buffered logs are sent and file resources are properly managed. It also includes improvements to the file reading logic to handle various data formats more robustly and expands test coverage for these critical components.

Highlights

Enhanced Slow Log Shutdown Handling: Introduced a new SLOW_LOG_READ_QUIT_SEND_ALL type and refactored the client monitor's shutdown logic to ensure all pending slow logs are processed and sent before the client fully exits, improving data reliability during termination.
Improved File Resource Management: The processFileInTheEnd function now takes a pointer to the file descriptor, allowing it to be explicitly set to NULL after closing, which helps prevent use-after-free issues and clarifies resource ownership.
Robust readFile Logic: The readFile function has been significantly enhanced to correctly handle various edge cases, including individual slow log entries exceeding the maximum send size and files containing empty or multiple null-terminated entries, ensuring accurate data extraction.
Comprehensive Error Handling and Resource Cleanup: Error handling in critical paths like sendReport and monitorPutData2MonitorQueue has been made more robust, with proper error code propagation and explicit resource freeing to prevent memory leaks and improve system stability.
Expanded Test Coverage for File Reading: New test cases have been added to clientMonitorTests.cpp to thoroughly validate the readFile function's behavior under diverse conditions, including different data lengths, single-item overflows, and various null terminator placements.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the client monitor's slow log handling, introducing a new SLOW_LOG_READ_QUIT_SEND_ALL type and modifying file I/O operations. Key changes include updating generateWriteSlowLog to safely check for empty JSON strings, revising processFileInTheEnd to accept a pointer to TdFilePtr for proper nulling, and adjusting monitorReportAsyncCB to use TSWAP for transferring file and filename ownership to prevent leaks. The readFile function was significantly altered to handle SLOW_LOG_SEND_SIZE_MAX boundaries and empty log entries more robustly, and getFileSize now returns an error code. Several functions (sendSlowLog, monitorReadSend, monitorSendSlowLogAtBeginning, monitorSendSlowLogAtRunning, monitorSendSlowLogAtQuit, monitorSendAllSlowLogAtQuit, monitorSendAllSlowLog, monitorSendAllSlowLogFromTempDir) were updated to pass file pointers and filenames by address and include more comprehensive error handling and resource management. The monitorThreadFunc logic was refined for graceful shutdown, incorporating a new quitTime and quitCnt mechanism, and a monitorPutQuitMsg2MonitorQueue function was added. The monitorPutData2MonitorQueue function now returns a specific error code when the monitor thread is exiting and includes monitorFreeSlowLogData on failure to prevent resource leaks. Additionally, the clientMonitorTest in CMakeLists.txt was enabled, and the clientMonitorTests.cpp file was updated with new test cases for readFile scenarios, including a local redefinition of SLOW_LOG_SEND_SIZE_MAX for testing purposes.

source/client/src/clientEnv.c

source/client/src/clientMonitor.c

gemini-code-assist · 2026-01-07T11:54:09Z

source/client/src/clientMonitor.c

+    tscInfo("monitor at quit remove file:%s", pClient->path);
+    --quitCnt;
+    pClient->offset = 0;


When size <= pClient->offset, the file pClient->pFile is not explicitly closed or nulled out, and processFileInTheEnd is not called. This can lead to a file handle leak if the file is no longer needed and pClient->pFile is not managed elsewhere. The file should be properly closed and its pointer nulled.

processFileInTheEnd(&pClient->pFile, pClient->path); --quitCnt; pClient->offset = 0;

gemini-code-assist · 2026-01-07T11:54:09Z

source/client/src/clientMonitor.c

  } else {
-    if (taosCloseFile(&(slowLogData->pFile)) != 0) {
-      tscError("failed to close file:%p", slowLogData->pFile);
-    }
-    monitorFreeSlowLogData(slowLogData);
    taosFreeQitem(slowLogData);
  }


If taosWriteQitem(monitorQueue, slowLogData) fails, slowLogData is freed using taosFreeQitem. However, if slowLogData contains allocated resources (like slowLogData->pFile or slowLogData->fileName when type == SLOW_LOG_READ_BEGINNIG), these resources will not be freed or closed, leading to a resource leak. monitorFreeSlowLogData should be called on slowLogData before taosFreeQitem if taosWriteQitem fails.

} else { monitorFreeSlowLogData(slowLogData); taosFreeQitem(slowLogData); }

gemini-code-assist · 2026-01-07T11:54:09Z

source/client/test/clientMonitorTests.cpp

+#undef SLOW_LOG_SEND_SIZE_MAX
+#define SLOW_LOG_SEND_SIZE_MAX 15


Redefining SLOW_LOG_SEND_SIZE_MAX locally in the test file can lead to inconsistencies between the test environment and the actual production code. If the production code's SLOW_LOG_SEND_SIZE_MAX changes, the test might still pass due to this local redefinition, giving a false sense of security. It's better to ensure tests use the same definitions as the production code or clearly document the divergence and its implications.

Copilot

Pull request overview

This PR refactors the slow log functionality in the client module to improve the quit/cleanup logic and add comprehensive test coverage. The changes introduce a new queue message type for handling quit operations, modify memory allocation calculations, and enable previously commented-out test infrastructure with new edge case tests.

Key changes:

Introduced SLOW_LOG_READ_QUIT_SEND_ALL queue type to better coordinate slow log flushing during shutdown
Modified memory allocation from 2 * size to 4 + size pattern to reserve space for JSON array brackets
Enabled and expanded test suite with 5 new test cases covering various data scenarios (single item, multiple items, oversized items, empty data)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
include/libs/monitor/clientMonitor.h	Added new `SLOW_LOG_READ_QUIT_SEND_ALL` enum value for coordinated shutdown handling
source/client/src/clientMonitor.c	Refactored quit logic with separate quitTime tracking, changed function signatures to use double pointers, modified memory allocation pattern, improved error handling
source/client/src/clientEnv.c	Enhanced validation to check for both NULL and empty string in JSON output
source/client/test/clientMonitorTests.cpp	Enabled test suite, added 5 new edge case tests with SLOW_LOG_SEND_SIZE_MAX override for testing, updated readFile implementation
source/client/test/CMakeLists.txt	Uncommented and enabled clientMonitorTest build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T11:57:35Z

source/client/test/clientMonitorTests.cpp

+
+  TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
+  if (pFile == NULL) {
+    uError("failed to open file:./test.txt since %s", terrstr());


Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Suggested change

uError("failed to open file:./test.txt since %s", terrstr());

uError("failed to open file:./tdengine-1-wewe since %s", terrstr());

Copilot · 2026-01-07T11:57:35Z

source/client/test/clientMonitorTests.cpp

+
+  TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
+  if (pFile == NULL) {
+    uError("failed to open file:./test.txt since %s", terrstr());


Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Copilot · 2026-01-07T11:57:36Z

include/libs/monitor/clientMonitor.h

  SLOW_LOG_READ_RUNNING = 1,
  SLOW_LOG_READ_BEGINNIG = 2,
  SLOW_LOG_READ_QUIT = 3,
+  SLOW_LOG_READ_QUIT_SEND_ALL = 4,


The queueTypeStr array is now missing an entry for the new SLOW_LOG_READ_QUIT_SEND_ALL enum value. This will cause array out-of-bounds access when trying to convert this enum value to a string. Add a corresponding string entry like "SLOW_LOG_READ_QUIT_SEND_ALL" to the queueTypeStr array.

Copilot · 2026-01-07T11:57:36Z

source/client/test/clientMonitorTests.cpp

+  // Create a void pointer and set it up for testing
+  void* pTransporter = NULL;
+
+  // Create an SEpSet object and set it up for testing
+  SEpSet* epSet = NULL;
+


Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.

Suggested change

// Create a void pointer and set it up for testing

void* pTransporter = NULL;

// Create an SEpSet object and set it up for testing

SEpSet* epSet = NULL;

Copilot · 2026-01-07T11:57:36Z

source/client/src/clientMonitor.c

-      return true;
-    }
+    tscInfo("monitor at quit remove file:%s", pClient->path);
+    --quitCnt;


Incorrect quitCnt decrement logic. Line 627 decrements quitCnt when size <= pClient->offset, but this count was only incremented in monitorSendAllSlowLogAtQuit (line 654) for cases where size > pClient->offset and sending succeeded. This creates a mismatch: the count tracks pending sends but is being decremented for cases that never had pending sends. This could cause quitCnt to go negative or reach 0 prematurely, affecting the quit timeout logic at line 792.

Suggested change

--quitCnt;

Copilot · 2026-01-07T11:57:40Z

source/client/test/clientMonitorTests.cpp

+  TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
+  if (pFile == NULL) {
+    uError("failed to open file:./test.txt since %s", terrstr());


Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Suggested change

TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);

if (pFile == NULL) {

uError("failed to open file:./test.txt since %s", terrstr());

const char *filePath = "./tdengine-1-wewe";

TdFilePtr pFile = taosOpenFile(filePath, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);

if (pFile == NULL) {

uError("failed to open file:%s since %s", filePath, terrstr());

Copilot · 2026-01-07T11:57:40Z

source/client/test/clientMonitorTests.cpp

  }

-  ASSERT(size > *offset);
+  ASSERT(size >= *offset);


Inconsistent comparison operators. Line 76 uses ">=" in the ASSERT, but line 439 in the production code at clientMonitor.c uses "<=" with inverted logic. The test code changed from ">" to ">=" which is correct (size can equal offset when the file is fully read), but this inconsistency between test and production code suggests the production code at line 439 should also be reviewed for consistency.

Copilot · 2026-01-07T11:57:40Z

source/client/test/clientMonitorTests.cpp

+  // Create a void pointer and set it up for testing
+  void* pTransporter = NULL;
+
+  // Create an SEpSet object and set it up for testing
+  SEpSet* epSet = NULL;
+


Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.

Suggested change

// Create a void pointer and set it up for testing

void* pTransporter = NULL;

// Create an SEpSet object and set it up for testing

SEpSet* epSet = NULL;

Copilot · 2026-01-07T11:57:40Z

source/client/test/clientMonitorTests.cpp

+    char* val = readFile(pFile, &offset, fileSize);
+    printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val);
+
+    ASSERT(strcmp(val, "[]") == 0);
+  }


Memory leak in test. The readFile function allocates memory for val (line 291), but this memory is never freed after the test completes.

Copilot · 2026-01-07T11:57:41Z

source/client/test/clientMonitorTests.cpp

+    char* val = readFile(pFile, &offset, fileSize);
+    printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val);
+
+    ASSERT(strcmp(val, "[]") == 0);
+  }


Memory leak in test. The readFile function allocates memory for val (line 333), but this memory is never freed after the test completes.

wangmm0220 · 2026-01-08T10:37:59Z

@gemini re-review

wangmm0220 · 2026-01-08T10:38:51Z

/review

wangmm0220 · 2026-01-09T01:05:42Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant and valuable refactoring of the slow log client logic. The changes centralize state management, simplify control flow, and improve robustness by handling file deletions and ensuring logs are flushed on exit. The addition and enabling of test cases for this functionality is also a great improvement. I've identified one critical bug that could lead to a crash and one high-severity memory leak. After addressing these issues, this will be a solid enhancement.

gemini-code-assist · 2026-01-09T01:08:17Z

source/client/src/clientEnv.c

+  if (value == NULL || strlen(value) == 0) {
+    tscError("failed to print json, data:%s", value);
    code = TSDB_CODE_FAILED;
    goto _end;
  }


Passing a NULL pointer to tscError for a %s format specifier will cause a crash. You should handle the NULL case separately from the empty string case for safer error handling.

if (value == NULL) { tscError("failed to print json: cJSON_PrintUnformatted returned NULL"); code = TSDB_CODE_FAILED; goto _end; } else if (strlen(value) == 0) { tscError("failed to print json: generated empty string"); code = TSDB_CODE_FAILED; goto _end; }

gemini-code-assist · 2026-01-09T01:08:17Z

source/client/src/clientMonitor.c

+    } else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) {
+      if (taosCloseFile(&pClient->pFile) != 0) {
+        tscError("failed to close file:%s ret:%d", pClient->path, terrno);
+      }
+      pClient->pFile = NULL;
+      if (taosRemoveFile(pClient->path) != 0) {
+        tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno);
+      }
+      tscInfo("monitor remove file:%s when send data out at beginning", pClient->path);
+    }


When a temporary slow log file (type SLOW_LOG_READ_BEGINNIG) is fully processed and removed, its corresponding SlowLogClient object is not removed from monitorSlowLogHashPath. This causes a memory leak of SlowLogClient objects and hash table entries. You should remove the entry from monitorSlowLogHashPath after the file is removed and free the associated memory.

} else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) { if (taosCloseFile(&pClient->pFile) != 0) { tscError("failed to close file:%s ret:%d", pClient->path, terrno); } pClient->pFile = NULL; if (taosRemoveFile(pClient->path) != 0) { tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno); } tscInfo("monitor remove file:%s when send data out at beginning", pClient->path); if (pClient->type == SLOW_LOG_READ_BEGINNIG) { void* val = taosHashRemove(monitorSlowLogHashPath, pClient->path, strlen(pClient->path)); if (val) { destroySlowLogClient(val); } } }

enh(client): refactor logic of slow log

9d2b8cb

Copilot AI review requested due to automatic review settings January 7, 2026 11:48

wangmm0220 requested review from dapan1121, guanshengliang and hzcheng as code owners January 7, 2026 11:48

taosdata-bot bot added the team engine label Jan 7, 2026

Copilot started reviewing on behalf of wangmm0220 January 7, 2026 11:48 View session

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

Copilot AI reviewed Jan 7, 2026

View reviewed changes

enh(client): refactor logic of slow log

1808c41

enh(client): refactor logic of slow log

b87e99a

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

		#undef SLOW_LOG_SEND_SIZE_MAX
		#define SLOW_LOG_SEND_SIZE_MAX 15

	uError("failed to open file:./test.txt since %s", terrstr());
	uError("failed to open file:./tdengine-1-wewe since %s", terrstr());

enh(client): refactor logic of slow log #34205

Are you sure you want to change the base?

enh(client): refactor logic of slow log #34205

Conversation

wangmm0220 commented Jan 7, 2026

Description

Issue(s)

Checklist

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

wangmm0220 commented Jan 8, 2026

Uh oh!

wangmm0220 commented Jan 8, 2026

Uh oh!

wangmm0220 commented Jan 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!