Skip to content

Conversation

@wangmm0220
Copy link
Contributor

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings January 7, 2026 11:48
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wangmm0220, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the client-side slow log monitoring and reporting mechanism. The primary goal is to enhance the reliability of slow log collection and transmission, particularly during client shutdown, by ensuring all buffered logs are sent and file resources are properly managed. It also includes improvements to the file reading logic to handle various data formats more robustly and expands test coverage for these critical components.

Highlights

  • Enhanced Slow Log Shutdown Handling: Introduced a new SLOW_LOG_READ_QUIT_SEND_ALL type and refactored the client monitor's shutdown logic to ensure all pending slow logs are processed and sent before the client fully exits, improving data reliability during termination.
  • Improved File Resource Management: The processFileInTheEnd function now takes a pointer to the file descriptor, allowing it to be explicitly set to NULL after closing, which helps prevent use-after-free issues and clarifies resource ownership.
  • Robust readFile Logic: The readFile function has been significantly enhanced to correctly handle various edge cases, including individual slow log entries exceeding the maximum send size and files containing empty or multiple null-terminated entries, ensuring accurate data extraction.
  • Comprehensive Error Handling and Resource Cleanup: Error handling in critical paths like sendReport and monitorPutData2MonitorQueue has been made more robust, with proper error code propagation and explicit resource freeing to prevent memory leaks and improve system stability.
  • Expanded Test Coverage for File Reading: New test cases have been added to clientMonitorTests.cpp to thoroughly validate the readFile function's behavior under diverse conditions, including different data lengths, single-item overflows, and various null terminator placements.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the client monitor's slow log handling, introducing a new SLOW_LOG_READ_QUIT_SEND_ALL type and modifying file I/O operations. Key changes include updating generateWriteSlowLog to safely check for empty JSON strings, revising processFileInTheEnd to accept a pointer to TdFilePtr for proper nulling, and adjusting monitorReportAsyncCB to use TSWAP for transferring file and filename ownership to prevent leaks. The readFile function was significantly altered to handle SLOW_LOG_SEND_SIZE_MAX boundaries and empty log entries more robustly, and getFileSize now returns an error code. Several functions (sendSlowLog, monitorReadSend, monitorSendSlowLogAtBeginning, monitorSendSlowLogAtRunning, monitorSendSlowLogAtQuit, monitorSendAllSlowLogAtQuit, monitorSendAllSlowLog, monitorSendAllSlowLogFromTempDir) were updated to pass file pointers and filenames by address and include more comprehensive error handling and resource management. The monitorThreadFunc logic was refined for graceful shutdown, incorporating a new quitTime and quitCnt mechanism, and a monitorPutQuitMsg2MonitorQueue function was added. The monitorPutData2MonitorQueue function now returns a specific error code when the monitor thread is exiting and includes monitorFreeSlowLogData on failure to prevent resource leaks. Additionally, the clientMonitorTest in CMakeLists.txt was enabled, and the clientMonitorTests.cpp file was updated with new test cases for readFile scenarios, including a local redefinition of SLOW_LOG_SEND_SIZE_MAX for testing purposes.

Comment on lines 626 to 628
tscInfo("monitor at quit remove file:%s", pClient->path);
--quitCnt;
pClient->offset = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When size <= pClient->offset, the file pClient->pFile is not explicitly closed or nulled out, and processFileInTheEnd is not called. This can lead to a file handle leak if the file is no longer needed and pClient->pFile is not managed elsewhere. The file should be properly closed and its pointer nulled.

    processFileInTheEnd(&pClient->pFile, pClient->path);
    --quitCnt;
    pClient->offset = 0;

Comment on lines 968 to 970
} else {
if (taosCloseFile(&(slowLogData->pFile)) != 0) {
tscError("failed to close file:%p", slowLogData->pFile);
}
monitorFreeSlowLogData(slowLogData);
taosFreeQitem(slowLogData);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If taosWriteQitem(monitorQueue, slowLogData) fails, slowLogData is freed using taosFreeQitem. However, if slowLogData contains allocated resources (like slowLogData->pFile or slowLogData->fileName when type == SLOW_LOG_READ_BEGINNIG), these resources will not be freed or closed, leading to a resource leak. monitorFreeSlowLogData should be called on slowLogData before taosFreeQitem if taosWriteQitem fails.

  } else {
    monitorFreeSlowLogData(slowLogData);
    taosFreeQitem(slowLogData);
  }

Comment on lines +67 to +68
#undef SLOW_LOG_SEND_SIZE_MAX
#define SLOW_LOG_SEND_SIZE_MAX 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Redefining SLOW_LOG_SEND_SIZE_MAX locally in the test file can lead to inconsistencies between the test environment and the actual production code. If the production code's SLOW_LOG_SEND_SIZE_MAX changes, the test might still pass due to this local redefinition, giving a false sense of security. It's better to ensure tests use the same definitions as the production code or clearly document the divergence and its implications.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the slow log functionality in the client module to improve the quit/cleanup logic and add comprehensive test coverage. The changes introduce a new queue message type for handling quit operations, modify memory allocation calculations, and enable previously commented-out test infrastructure with new edge case tests.

Key changes:

  • Introduced SLOW_LOG_READ_QUIT_SEND_ALL queue type to better coordinate slow log flushing during shutdown
  • Modified memory allocation from 2 * size to 4 + size pattern to reserve space for JSON array brackets
  • Enabled and expanded test suite with 5 new test cases covering various data scenarios (single item, multiple items, oversized items, empty data)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 25 comments.

Show a summary per file
File Description
include/libs/monitor/clientMonitor.h Added new SLOW_LOG_READ_QUIT_SEND_ALL enum value for coordinated shutdown handling
source/client/src/clientMonitor.c Refactored quit logic with separate quitTime tracking, changed function signatures to use double pointers, modified memory allocation pattern, improved error handling
source/client/src/clientEnv.c Enhanced validation to check for both NULL and empty string in JSON output
source/client/test/clientMonitorTests.cpp Enabled test suite, added 5 new edge case tests with SLOW_LOG_SEND_SIZE_MAX override for testing, updated readFile implementation
source/client/test/CMakeLists.txt Uncommented and enabled clientMonitorTest build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
if (pFile == NULL) {
uError("failed to open file:./test.txt since %s", terrstr());
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Suggested change
uError("failed to open file:./test.txt since %s", terrstr());
uError("failed to open file:./tdengine-1-wewe since %s", terrstr());

Copilot uses AI. Check for mistakes.

TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
if (pFile == NULL) {
uError("failed to open file:./test.txt since %s", terrstr());
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Copilot uses AI. Check for mistakes.
SLOW_LOG_READ_RUNNING = 1,
SLOW_LOG_READ_BEGINNIG = 2,
SLOW_LOG_READ_QUIT = 3,
SLOW_LOG_READ_QUIT_SEND_ALL = 4,
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queueTypeStr array is now missing an entry for the new SLOW_LOG_READ_QUIT_SEND_ALL enum value. This will cause array out-of-bounds access when trying to convert this enum value to a string. Add a corresponding string entry like "SLOW_LOG_READ_QUIT_SEND_ALL" to the queueTypeStr array.

Copilot uses AI. Check for mistakes.
Comment on lines +278 to +283
// Create a void pointer and set it up for testing
void* pTransporter = NULL;

// Create an SEpSet object and set it up for testing
SEpSet* epSet = NULL;

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.

Suggested change
// Create a void pointer and set it up for testing
void* pTransporter = NULL;
// Create an SEpSet object and set it up for testing
SEpSet* epSet = NULL;

Copilot uses AI. Check for mistakes.
return true;
}
tscInfo("monitor at quit remove file:%s", pClient->path);
--quitCnt;
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect quitCnt decrement logic. Line 627 decrements quitCnt when size <= pClient->offset, but this count was only incremented in monitorSendAllSlowLogAtQuit (line 654) for cases where size > pClient->offset and sending succeeded. This creates a mismatch: the count tracks pending sends but is being decremented for cases that never had pending sends. This could cause quitCnt to go negative or reach 0 prematurely, affecting the quit timeout logic at line 792.

Suggested change
--quitCnt;

Copilot uses AI. Check for mistakes.
Comment on lines +346 to +348
TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
if (pFile == NULL) {
uError("failed to open file:./test.txt since %s", terrstr());
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.

Suggested change
TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
if (pFile == NULL) {
uError("failed to open file:./test.txt since %s", terrstr());
const char *filePath = "./tdengine-1-wewe";
TdFilePtr pFile = taosOpenFile(filePath, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC);
if (pFile == NULL) {
uError("failed to open file:%s since %s", filePath, terrstr());

Copilot uses AI. Check for mistakes.
}

ASSERT(size > *offset);
ASSERT(size >= *offset);
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent comparison operators. Line 76 uses ">=" in the ASSERT, but line 439 in the production code at clientMonitor.c uses "<=" with inverted logic. The test code changed from ">" to ">=" which is correct (size can equal offset when the file is fully read), but this inconsistency between test and production code suggests the production code at line 439 should also be reviewed for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +363 to +368
// Create a void pointer and set it up for testing
void* pTransporter = NULL;

// Create an SEpSet object and set it up for testing
SEpSet* epSet = NULL;

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.

Suggested change
// Create a void pointer and set it up for testing
void* pTransporter = NULL;
// Create an SEpSet object and set it up for testing
SEpSet* epSet = NULL;

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +295
char* val = readFile(pFile, &offset, fileSize);
printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val);

ASSERT(strcmp(val, "[]") == 0);
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak in test. The readFile function allocates memory for val (line 291), but this memory is never freed after the test completes.

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +337
char* val = readFile(pFile, &offset, fileSize);
printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val);

ASSERT(strcmp(val, "[]") == 0);
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak in test. The readFile function allocates memory for val (line 333), but this memory is never freed after the test completes.

Copilot uses AI. Check for mistakes.
@wangmm0220
Copy link
Contributor Author

@gemini re-review

@wangmm0220
Copy link
Contributor Author

/review

@wangmm0220
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and valuable refactoring of the slow log client logic. The changes centralize state management, simplify control flow, and improve robustness by handling file deletions and ensuring logs are flushed on exit. The addition and enabling of test cases for this functionality is also a great improvement. I've identified one critical bug that could lead to a crash and one high-severity memory leak. After addressing these issues, this will be a solid enhancement.

Comment on lines +201 to 205
if (value == NULL || strlen(value) == 0) {
tscError("failed to print json, data:%s", value);
code = TSDB_CODE_FAILED;
goto _end;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Passing a NULL pointer to tscError for a %s format specifier will cause a crash. You should handle the NULL case separately from the empty string case for safer error handling.

  if (value == NULL) {
    tscError("failed to print json: cJSON_PrintUnformatted returned NULL");
    code = TSDB_CODE_FAILED;
    goto _end;
  } else if (strlen(value) == 0) {
    tscError("failed to print json: generated empty string");
    code = TSDB_CODE_FAILED;
    goto _end;
  }

Comment on lines +562 to +571
} else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) {
if (taosCloseFile(&pClient->pFile) != 0) {
tscError("failed to close file:%s ret:%d", pClient->path, terrno);
}
pClient->pFile = NULL;
if (taosRemoveFile(pClient->path) != 0) {
tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno);
}
tscInfo("monitor remove file:%s when send data out at beginning", pClient->path);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When a temporary slow log file (type SLOW_LOG_READ_BEGINNIG) is fully processed and removed, its corresponding SlowLogClient object is not removed from monitorSlowLogHashPath. This causes a memory leak of SlowLogClient objects and hash table entries. You should remove the entry from monitorSlowLogHashPath after the file is removed and free the associated memory.

    } else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) {
      if (taosCloseFile(&pClient->pFile) != 0) {
        tscError("failed to close file:%s ret:%d", pClient->path, terrno);
      }
      pClient->pFile = NULL;
      if (taosRemoveFile(pClient->path) != 0) {
        tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno);
      }
      tscInfo("monitor remove file:%s when send data out at beginning", pClient->path);

      if (pClient->type == SLOW_LOG_READ_BEGINNIG) {
        void* val = taosHashRemove(monitorSlowLogHashPath, pClient->path, strlen(pClient->path));
        if (val) {
          destroySlowLogClient(val);
        }
      }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants