-
Notifications
You must be signed in to change notification settings - Fork 5k
enh(client): refactor logic of slow log #34205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.0
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @wangmm0220, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on refining the client-side slow log monitoring and reporting mechanism. The primary goal is to enhance the reliability of slow log collection and transmission, particularly during client shutdown, by ensuring all buffered logs are sent and file resources are properly managed. It also includes improvements to the file reading logic to handle various data formats more robustly and expands test coverage for these critical components. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the client monitor's slow log handling, introducing a new SLOW_LOG_READ_QUIT_SEND_ALL type and modifying file I/O operations. Key changes include updating generateWriteSlowLog to safely check for empty JSON strings, revising processFileInTheEnd to accept a pointer to TdFilePtr for proper nulling, and adjusting monitorReportAsyncCB to use TSWAP for transferring file and filename ownership to prevent leaks. The readFile function was significantly altered to handle SLOW_LOG_SEND_SIZE_MAX boundaries and empty log entries more robustly, and getFileSize now returns an error code. Several functions (sendSlowLog, monitorReadSend, monitorSendSlowLogAtBeginning, monitorSendSlowLogAtRunning, monitorSendSlowLogAtQuit, monitorSendAllSlowLogAtQuit, monitorSendAllSlowLog, monitorSendAllSlowLogFromTempDir) were updated to pass file pointers and filenames by address and include more comprehensive error handling and resource management. The monitorThreadFunc logic was refined for graceful shutdown, incorporating a new quitTime and quitCnt mechanism, and a monitorPutQuitMsg2MonitorQueue function was added. The monitorPutData2MonitorQueue function now returns a specific error code when the monitor thread is exiting and includes monitorFreeSlowLogData on failure to prevent resource leaks. Additionally, the clientMonitorTest in CMakeLists.txt was enabled, and the clientMonitorTests.cpp file was updated with new test cases for readFile scenarios, including a local redefinition of SLOW_LOG_SEND_SIZE_MAX for testing purposes.
source/client/src/clientMonitor.c
Outdated
| tscInfo("monitor at quit remove file:%s", pClient->path); | ||
| --quitCnt; | ||
| pClient->offset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When size <= pClient->offset, the file pClient->pFile is not explicitly closed or nulled out, and processFileInTheEnd is not called. This can lead to a file handle leak if the file is no longer needed and pClient->pFile is not managed elsewhere. The file should be properly closed and its pointer nulled.
processFileInTheEnd(&pClient->pFile, pClient->path);
--quitCnt;
pClient->offset = 0;| } else { | ||
| if (taosCloseFile(&(slowLogData->pFile)) != 0) { | ||
| tscError("failed to close file:%p", slowLogData->pFile); | ||
| } | ||
| monitorFreeSlowLogData(slowLogData); | ||
| taosFreeQitem(slowLogData); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If taosWriteQitem(monitorQueue, slowLogData) fails, slowLogData is freed using taosFreeQitem. However, if slowLogData contains allocated resources (like slowLogData->pFile or slowLogData->fileName when type == SLOW_LOG_READ_BEGINNIG), these resources will not be freed or closed, leading to a resource leak. monitorFreeSlowLogData should be called on slowLogData before taosFreeQitem if taosWriteQitem fails.
} else {
monitorFreeSlowLogData(slowLogData);
taosFreeQitem(slowLogData);
}| #undef SLOW_LOG_SEND_SIZE_MAX | ||
| #define SLOW_LOG_SEND_SIZE_MAX 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefining SLOW_LOG_SEND_SIZE_MAX locally in the test file can lead to inconsistencies between the test environment and the actual production code. If the production code's SLOW_LOG_SEND_SIZE_MAX changes, the test might still pass due to this local redefinition, giving a false sense of security. It's better to ensure tests use the same definitions as the production code or clearly document the divergence and its implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the slow log functionality in the client module to improve the quit/cleanup logic and add comprehensive test coverage. The changes introduce a new queue message type for handling quit operations, modify memory allocation calculations, and enable previously commented-out test infrastructure with new edge case tests.
Key changes:
- Introduced
SLOW_LOG_READ_QUIT_SEND_ALLqueue type to better coordinate slow log flushing during shutdown - Modified memory allocation from
2 * sizeto4 + sizepattern to reserve space for JSON array brackets - Enabled and expanded test suite with 5 new test cases covering various data scenarios (single item, multiple items, oversized items, empty data)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 25 comments.
Show a summary per file
| File | Description |
|---|---|
| include/libs/monitor/clientMonitor.h | Added new SLOW_LOG_READ_QUIT_SEND_ALL enum value for coordinated shutdown handling |
| source/client/src/clientMonitor.c | Refactored quit logic with separate quitTime tracking, changed function signatures to use double pointers, modified memory allocation pattern, improved error handling |
| source/client/src/clientEnv.c | Enhanced validation to check for both NULL and empty string in JSON output |
| source/client/test/clientMonitorTests.cpp | Enabled test suite, added 5 new edge case tests with SLOW_LOG_SEND_SIZE_MAX override for testing, updated readFile implementation |
| source/client/test/CMakeLists.txt | Uncommented and enabled clientMonitorTest build configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC); | ||
| if (pFile == NULL) { | ||
| uError("failed to open file:./test.txt since %s", terrstr()); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.
| uError("failed to open file:./test.txt since %s", terrstr()); | |
| uError("failed to open file:./tdengine-1-wewe since %s", terrstr()); |
|
|
||
| TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC); | ||
| if (pFile == NULL) { | ||
| uError("failed to open file:./test.txt since %s", terrstr()); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.
include/libs/monitor/clientMonitor.h
Outdated
| SLOW_LOG_READ_RUNNING = 1, | ||
| SLOW_LOG_READ_BEGINNIG = 2, | ||
| SLOW_LOG_READ_QUIT = 3, | ||
| SLOW_LOG_READ_QUIT_SEND_ALL = 4, |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queueTypeStr array is now missing an entry for the new SLOW_LOG_READ_QUIT_SEND_ALL enum value. This will cause array out-of-bounds access when trying to convert this enum value to a string. Add a corresponding string entry like "SLOW_LOG_READ_QUIT_SEND_ALL" to the queueTypeStr array.
| // Create a void pointer and set it up for testing | ||
| void* pTransporter = NULL; | ||
|
|
||
| // Create an SEpSet object and set it up for testing | ||
| SEpSet* epSet = NULL; | ||
|
|
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.
| // Create a void pointer and set it up for testing | |
| void* pTransporter = NULL; | |
| // Create an SEpSet object and set it up for testing | |
| SEpSet* epSet = NULL; |
source/client/src/clientMonitor.c
Outdated
| return true; | ||
| } | ||
| tscInfo("monitor at quit remove file:%s", pClient->path); | ||
| --quitCnt; |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect quitCnt decrement logic. Line 627 decrements quitCnt when size <= pClient->offset, but this count was only incremented in monitorSendAllSlowLogAtQuit (line 654) for cases where size > pClient->offset and sending succeeded. This creates a mismatch: the count tracks pending sends but is being decremented for cases that never had pending sends. This could cause quitCnt to go negative or reach 0 prematurely, affecting the quit timeout logic at line 792.
| --quitCnt; |
| TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC); | ||
| if (pFile == NULL) { | ||
| uError("failed to open file:./test.txt since %s", terrstr()); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading error message. The error message says "failed to open file:./test.txt" but the actual file being opened is "./tdengine-1-wewe". The error message should reflect the actual file path being used.
| TdFilePtr pFile = taosOpenFile("./tdengine-1-wewe", TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC); | |
| if (pFile == NULL) { | |
| uError("failed to open file:./test.txt since %s", terrstr()); | |
| const char *filePath = "./tdengine-1-wewe"; | |
| TdFilePtr pFile = taosOpenFile(filePath, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_APPEND | TD_FILE_READ | TD_FILE_TRUNC); | |
| if (pFile == NULL) { | |
| uError("failed to open file:%s since %s", filePath, terrstr()); |
| } | ||
|
|
||
| ASSERT(size > *offset); | ||
| ASSERT(size >= *offset); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comparison operators. Line 76 uses ">=" in the ASSERT, but line 439 in the production code at clientMonitor.c uses "<=" with inverted logic. The test code changed from ">" to ">=" which is correct (size can equal offset when the file is fully read), but this inconsistency between test and production code suggests the production code at line 439 should also be reviewed for consistency.
| // Create a void pointer and set it up for testing | ||
| void* pTransporter = NULL; | ||
|
|
||
| // Create an SEpSet object and set it up for testing | ||
| SEpSet* epSet = NULL; | ||
|
|
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variables declared. The variables pTransporter and epSet are declared but never used in these tests. These should be removed to keep the test code clean.
| // Create a void pointer and set it up for testing | |
| void* pTransporter = NULL; | |
| // Create an SEpSet object and set it up for testing | |
| SEpSet* epSet = NULL; |
| char* val = readFile(pFile, &offset, fileSize); | ||
| printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val); | ||
|
|
||
| ASSERT(strcmp(val, "[]") == 0); | ||
| } |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak in test. The readFile function allocates memory for val (line 291), but this memory is never freed after the test completes.
| char* val = readFile(pFile, &offset, fileSize); | ||
| printf("offset:%lld,fileSize:%lld,val:%s\n", offset, fileSize, val); | ||
|
|
||
| ASSERT(strcmp(val, "[]") == 0); | ||
| } |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak in test. The readFile function allocates memory for val (line 333), but this memory is never freed after the test completes.
|
@gemini re-review |
|
/review |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant and valuable refactoring of the slow log client logic. The changes centralize state management, simplify control flow, and improve robustness by handling file deletions and ensuring logs are flushed on exit. The addition and enabling of test cases for this functionality is also a great improvement. I've identified one critical bug that could lead to a crash and one high-severity memory leak. After addressing these issues, this will be a solid enhancement.
| if (value == NULL || strlen(value) == 0) { | ||
| tscError("failed to print json, data:%s", value); | ||
| code = TSDB_CODE_FAILED; | ||
| goto _end; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a NULL pointer to tscError for a %s format specifier will cause a crash. You should handle the NULL case separately from the empty string case for safer error handling.
if (value == NULL) {
tscError("failed to print json: cJSON_PrintUnformatted returned NULL");
code = TSDB_CODE_FAILED;
goto _end;
} else if (strlen(value) == 0) {
tscError("failed to print json: generated empty string");
code = TSDB_CODE_FAILED;
goto _end;
}| } else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) { | ||
| if (taosCloseFile(&pClient->pFile) != 0) { | ||
| tscError("failed to close file:%s ret:%d", pClient->path, terrno); | ||
| } | ||
| pClient->pFile = NULL; | ||
| if (taosRemoveFile(pClient->path) != 0) { | ||
| tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno); | ||
| } | ||
| tscInfo("monitor remove file:%s when send data out at beginning", pClient->path); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a temporary slow log file (type SLOW_LOG_READ_BEGINNIG) is fully processed and removed, its corresponding SlowLogClient object is not removed from monitorSlowLogHashPath. This causes a memory leak of SlowLogClient objects and hash table entries. You should remove the entry from monitorSlowLogHashPath after the file is removed and free the associated memory.
} else if (pClient->type == SLOW_LOG_READ_BEGINNIG || pClient->type == SLOW_LOG_READ_QUIT) {
if (taosCloseFile(&pClient->pFile) != 0) {
tscError("failed to close file:%s ret:%d", pClient->path, terrno);
}
pClient->pFile = NULL;
if (taosRemoveFile(pClient->path) != 0) {
tscError("failed to remove file:%s, terrno:%d", pClient->path, terrno);
}
tscInfo("monitor remove file:%s when send data out at beginning", pClient->path);
if (pClient->type == SLOW_LOG_READ_BEGINNIG) {
void* val = taosHashRemove(monitorSlowLogHashPath, pClient->path, strlen(pClient->path));
if (val) {
destroySlowLogClient(val);
}
}
}
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.