-
Notifications
You must be signed in to change notification settings - Fork 148
feat: Add mllm-cli support for Qwen3 and update docs #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request extends the MLLM framework with a complete Go-based CLI client-server infrastructure, new C API session management capabilities, Go bindings to the C SDK, and Android build configurations. It enables the C SDK binding by default and introduces HTTP-based multi-turn chat functionality with streaming responses. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Client as CLI Client
participant Server as HTTP Server
participant Service as MLLM Service
participant CApi as C API Layer
User->>Client: /exit or chat input
Client->>Client: Validate input
alt Client exit command
Client->>Client: Clear & exit
else User message
Client->>Client: Add to history
Client->>Server: POST /v1/chat/completions<br/>(messages, session_id, stream=true)
activate Server
Server->>Service: GetSession(session_id)
activate Service
Service-->>Server: *Session
deactivate Service
Server->>Server: Generate request ID<br/>(UUID if missing)
Server->>CApi: SendRequest(session_id, json)
activate CApi
CApi-->>Server: Request ID
deactivate CApi
Server->>Client: Set SSE headers<br/>Start streaming
loop Poll until completion
Server->>CApi: PollResponse(session_id)
CApi-->>Server: JSON response chunk
Server->>Server: Parse chunk<br/>Check finish_reason
Server->>Client: Stream data line<br/>(SSE format)
Client->>Client: Accumulate content
Client->>User: Print streamed text
alt Finish reason = "stop"
Server->>Server: Mark complete
else Empty response
Server->>Server: Idle, retry
end
end
Server->>Client: [DONE]
deactivate Server
Client->>Client: Update history<br/>with full response
end
sequenceDiagram
participant main as main()
participant Service as StartService()
participant Session as NewSession()
participant CApi as C API Layer
participant HTTP as HTTP Server
main->>main: Parse --model-path flag
main->>main: Check required args
main->>Service: StartService(workers)
activate Service
Service->>CApi: startService()
CApi-->>Service: MllmCAny result
Service-->>main: bool (success)
deactivate Service
main->>Session: NewSession(model_path)
activate Session
Session->>CApi: createQwen3Session(model_path)
CApi-->>Session: MllmCAny session handle
Session->>Session: Set finalizer for cleanup
Session-->>main: *Session, error
deactivate Session
main->>main: Insert(session_id)
main->>main: RegisterSession(id, session)
main->>HTTP: NewServer(":8080", service)
activate HTTP
HTTP->>HTTP: Register /v1/chat/completions
HTTP-->>main: *Server
deactivate HTTP
main->>HTTP: Start()
HTTP->>HTTP: Listen & accept connections
main->>main: Wait for SIGINT/SIGTERM
main->>HTTP: Shutdown(ctx)
HTTP->>HTTP: Graceful server close
main->>Service: Shutdown()
Service->>Session: Close()
Session->>CApi: freeSession()
Service->>Service: Clear sessions map
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (23)
tasks/build_android_mllm_server.yaml (1)
11-13: Consider parameterizing the NDK path.The hardcoded NDK path
/opt/ndk/android-ndk-r28breduces portability. WhileANDROID_NDK_HOMEis already exported, consider allowing it to be overridden from the environment if it's already set.- export ANDROID_NDK_HOME=/opt/ndk/android-ndk-r28b + export ANDROID_NDK_HOME=${ANDROID_NDK_HOME:-/opt/ndk/android-ndk-r28b}tasks/build_android_mllm_client.yaml (1)
11-13: Consider parameterizing the NDK path.Same recommendation as the server build: allow
ANDROID_NDK_HOMEto be overridden from the environment.tasks/build_android_go_dialog_test.yaml (1)
12-14: Consider parameterizing the NDK path.Allow
ANDROID_NDK_HOMEto be overridden from the environment for better portability.mllm/c_api/Runtime.cpp (3)
35-41: Add braces around single-statement if blocks.For consistency and safety, wrap single-statement if blocks in braces.
int32_t isOk(MllmCAny ret) { - if (ret.type_id == kRetCode && ret.v_return_code == 0) + if (ret.type_id == kRetCode && ret.v_return_code == 0) { return true; - if (ret.type_id == kCustomObject && ret.v_custom_ptr != nullptr) + } + if (ret.type_id == kCustomObject && ret.v_custom_ptr != nullptr) { return true; + } return false; }
165-166: Consider safer string copy alternatives.
strncpydoesn't guarantee null-termination if the source is too long. Sinceresponse.length() + 1bytes are allocated, consider usingstrcpyormemcpyfollowed by explicit null-termination.- char* c_response = new char[response.length() + 1]; - strncpy(c_response, response.c_str(), response.length() + 1); + char* c_response = new char[response.length() + 1]; + std::memcpy(c_response, response.c_str(), response.length()); + c_response[response.length()] = '\0';
172-173: Simplify null pointer check before delete.The null check is unnecessary since
delete nullptris safe in C++.void freeResponseString(const char* response_str) { - if (response_str != nullptr) { - delete[] response_str; - } + delete[] response_str; }tasks/build_android_debug.yaml (1)
12-12: Consider using a relative or configurable install prefix.The hard-coded
/root/mllm-install-android-arm64-v8apath assumes a specific user directory and won't work in environments where builds run under different users or in CI systems with different home directories.Consider using a relative path or an environment variable:
- - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a" + - "-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX:-./install-android-arm64-v8a}"mllm-cli/pkg/server/handlers.go (4)
23-27: Consider logging the decode error for debugging.When JSON decoding fails, the actual error is not logged, making it harder to diagnose malformed requests during troubleshooting.
Apply this diff:
var requestPayload map[string]interface{} if err := json.NewDecoder(r.Body).Decode(&requestPayload); err != nil { + log.Printf("ERROR: Failed to decode request body: %v", err) http.Error(w, "Invalid request body", http.StatusBadRequest) return }
52-55: Consider logging the reason for SendRequest failure.When
session.SendRequestreturnsfalse, no additional context is provided about why the request failed. This makes debugging difficult when requests are rejected by the underlying session.If the underlying
SendRequestimplementation can provide error details, consider enhancing the API to return an error instead of just a boolean. Otherwise, add internal logging withinSendRequestto capture failure reasons.
60-60: Add safety check for Flusher type assertion.The type assertion
w.(http.Flusher)doesn't check theokreturn value. While most HTTP ResponseWriters support Flusher, it's safer to verify.Apply this diff:
- flusher, _ := w.(http.Flusher) + flusher, ok := w.(http.Flusher) + if !ok { + http.Error(w, "Streaming not supported", http.StatusInternalServerError) + return + }
63-90: LGTM with one consideration on polling interval.The streaming logic correctly handles client disconnects, end markers, and empty responses. The implementation properly filters out empty deltas before sending to avoid unnecessary SSE messages.
One consideration: the 10ms polling interval (line 84) creates a relatively tight loop. Depending on the underlying model's response generation speed, you might consider making this configurable or slightly increasing it to reduce CPU usage during idle periods.
If desired, consider making the polling interval configurable:
+const responsePollingInterval = 50 * time.Millisecond + for { // ... existing code ... } else { - time.Sleep(10 * time.Millisecond) + time.Sleep(responsePollingInterval) } }mllm-cli/cmd/mllm-client/main.go (3)
19-19: Consider making the server URL configurable.The server URL is hard-coded to
http://localhost:8080/v1/chat/completions, which limits flexibility for connecting to different servers or ports.Make the URL configurable via a command-line flag:
+import "flag" + func main() { - serverURL := "http://localhost:8080/v1/chat/completions" + serverURL := flag.String("server-url", "http://localhost:8080/v1/chat/completions", "Server URL for chat completions") + flag.Parse() var history []api.RequestMessageThen use
*serverURLthroughout the code.
30-30: Handle potential error from ReadString.The error from
reader.ReadString('\n')is silently ignored, which could mask I/O issues.Apply this diff:
- userInput, _ := reader.ReadString('\n') + userInput, err := reader.ReadString('\n') + if err != nil { + log.Printf("ERROR: Failed to read input: %v", err) + continue + }
76-82: Consider logging JSON unmarshal errors.When
json.Unmarshalfails to parse a chunk, the error is silently ignored. This could make debugging issues with malformed server responses more difficult.Apply this diff:
var chunk api.OpenAIResponseChunk - if json.Unmarshal([]byte(jsonData), &chunk) == nil && len(chunk.Choices) > 0 { + if err := json.Unmarshal([]byte(jsonData), &chunk); err != nil { + log.Printf("WARN: Failed to unmarshal chunk: %v", err) + } else if len(chunk.Choices) > 0 { content := chunk.Choices[0].Delta.Content fmt.Print(content) fullResponse.WriteString(content) }mllm-cli/pkg/server/server.go (1)
17-30: Consider adding nil check for mllmService parameter.The
NewServerconstructor doesn't validate thatmllmServiceis non-nil, which could lead to a panic when the handler tries to access it.Add a nil check:
func NewServer(addr string, mllmService *mllm.Service) *Server { + if mllmService == nil { + log.Fatal("FATAL: mllmService cannot be nil") + } mux := http.NewServeMux()mllm-cli/pkg/mllm/service.go (1)
23-27: Consider logging when overwriting an existing session.The
RegisterSessionmethod allows overwriting an existing session without warning. This could mask bugs where the same session ID is registered multiple times.Add logging when overwriting:
func (s *Service) RegisterSession(id string, session *mllm.Session) { s.mutex.Lock() defer s.mutex.Unlock() + if _, exists := s.sessions[id]; exists { + log.Printf("WARN: Overwriting existing session with ID: %s", id) + } s.sessions[id] = session }mllm/c_api/Runtime.h (1)
51-53: Clarify memory management contract for pollResponse.The pairing of
pollResponse()returning aconst char*with a separatefreeResponseString()function creates potential for memory leaks if callers forget to free. Consider if the API could be made safer.Options to consider:
- Document the memory contract very clearly (which the documentation comment above addresses).
- Consider returning a status code with an out-parameter for the string, making memory ownership more explicit.
- Use a buffer provided by the caller to avoid heap allocation.
For now, ensure that all Go wrapper code and examples demonstrate proper usage with
deferto free responses immediately after use.mllm-cli/cmd/mllm-server/main.go (1)
30-32: Consider making log level and worker threads configurable.The log level (2) and worker thread count (4) are hard-coded. Making these configurable via flags would improve flexibility for different deployment environments.
Add command-line flags:
func main() { modelPath := flag.String("model-path", "", "Path to the MLLM model directory.") + logLevel := flag.Int("log-level", 2, "Log level (0=OFF, 1=ERROR, 2=INFO, 3=DEBUG)") + workerThreads := flag.Int("workers", 4, "Number of worker threads") flag.Parse() if *modelPath == "" { log.Fatal("FATAL: --model-path argument is required.") } if !mllm.InitializeContext() { log.Fatal("FATAL: InitializeContext failed!") } - mllm.SetLogLevel(2) - if !mllm.StartService(4) { + mllm.SetLogLevel(*logLevel) + if !mllm.StartService(*workerThreads) { log.Fatal("FATAL: StartService failed!") }mllm-cli/pkg/api/types.go (1)
14-16: Consider English comments and clarify EnableThinking vs Thinking fields.Two observations:
- The comment on line 15 is in Chinese, which may not be accessible to all contributors. Consider using English for consistency.
- The
EnableThinkingandThinkingfields appear to serve similar purposes. Having both could lead to confusion about which field clients should use.Clarify the distinction or consolidate:
type OpenAIRequest struct { Model string `json:"model"` Messages []RequestMessage `json:"messages"` Stream bool `json:"stream"` - EnableThinking bool `json:"enable_thinking,omitempty"` - Thinking bool `json:"thinking,omitempty"` // <-- 新增此行,用于接收客户端可能发送的 "thinking": true + // EnableThinking enables the model's reasoning/thinking mode + EnableThinking bool `json:"enable_thinking,omitempty"` SessionID string `json:"session_id,omitempty"` }If both fields are genuinely needed, document why they're distinct and when each should be used.
mllm-cli/mllm/c.go (4)
8-8: Link portability: consider rpath/build tags to avoid runtime loader issues.Linking against -lMllmSdkC -lMllmRT -lMllmCPUBackend assumes system search paths are configured. For CLI distribution, consider:
- Adding an rpath to the binary (e.g., -Wl,-rpath,$ORIGIN or a project-specific lib dir).
- Using build tags/conditional LDFLAGS per OS/arch if backends differ.
- Documenting LD_LIBRARY_PATH/DYLD_LIBRARY_PATH requirements.
59-74: Finalizer is a safety net, not a guarantee; consider reducing side effects and improving error context.
- Avoid relying on the finalizer for correctness; explicit Close should be the primary path.
- Printing in finalizers can surprise users; prefer a debug logger or remove the print.
- The error message should include modelPath and be English/localized consistently; include C-side error if available.
Apply this diff:
- handle := C.createQwen3Session(cModelPath) - if !isOk(handle) { - return nil, fmt.Errorf("底层C API createQwen3Session 失败") - } + handle := C.createQwen3Session(cModelPath) + if !isOk(handle) { + return nil, fmt.Errorf("createQwen3Session failed for model %q", modelPath) + } @@ - runtime.SetFinalizer(s, func(s *Session) { - fmt.Println("[Go Finalizer] Mllm Session automatically released.") - C.freeSession(s.cHandle) - }) + runtime.SetFinalizer(s, func(s *Session) { + // Finalizer is best-effort; prefer explicit Close(). + C.freeSession(s.cHandle) + })
55-57: Surface SetLogLevel errors or document valid range.If the C API validates levels, consider returning bool/error or documenting accepted range to help callers detect misconfiguration.
Would you like me to add a thin wrapper returning bool and adjust callers?
95-99: Optional: validate jsonRequest early.If empty strings are invalid, reject early for clearer errors; if JSON must be well-formed, consider basic validation before crossing the cgo boundary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
CMakeLists.txt(1 hunks)mllm-cli/cmd/mllm-client/main.go(1 hunks)mllm-cli/cmd/mllm-server/main.go(1 hunks)mllm-cli/go.mod(2 hunks)mllm-cli/mllm/c.go(2 hunks)mllm-cli/pkg/api/types.go(1 hunks)mllm-cli/pkg/mllm/service.go(1 hunks)mllm-cli/pkg/server/handlers.go(1 hunks)mllm-cli/pkg/server/server.go(1 hunks)mllm/c_api/Object.h(1 hunks)mllm/c_api/Runtime.cpp(2 hunks)mllm/c_api/Runtime.h(1 hunks)task.py(2 hunks)tasks/build_android_debug.yaml(1 hunks)tasks/build_android_go_dialog_test.yaml(1 hunks)tasks/build_android_mllm_client.yaml(1 hunks)tasks/build_android_mllm_server.yaml(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
mllm/c_api/Runtime.cpp
[error] 11-11: inclusion of deprecated C++ header 'string.h'; consider using 'cstring' instead
(modernize-deprecated-headers,-warnings-as-errors)
[error] 36-36: statement should be inside braces
(google-readability-braces-around-statements,readability-braces-around-statements,-warnings-as-errors)
[error] 37-37: implicit conversion bool -> 'int'
(readability-implicit-bool-conversion,-warnings-as-errors)
[error] 38-38: statement should be inside braces
(google-readability-braces-around-statements,readability-braces-around-statements,-warnings-as-errors)
[error] 39-39: implicit conversion bool -> 'int'
(readability-implicit-bool-conversion,-warnings-as-errors)
[error] 40-40: implicit conversion bool -> 'int'
(readability-implicit-bool-conversion,-warnings-as-errors)
[error] 66-66: variable 'worker_threads' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 88-88: initializing non-owner 'MllmSessionWrapper *' with a newly created 'gsl::owner<>'
(cppcoreguidelines-owning-memory,-warnings-as-errors)
[error] 104-104: variable 'session_wrapper' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 115-115: variable 'session_wrapper' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 126-126: variable 'status' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 135-135: variable 'request_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 136-136: variable 'response' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 144-144: variable 'j' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 144-144: variable name 'j' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 165-165: initializing non-owner 'char *' with a newly created 'gsl::owner<>'
(cppcoreguidelines-owning-memory,-warnings-as-errors)
[error] 172-172: 'if' statement is unnecessary; deleting null pointer has no effect
(readability-delete-null-pointer,-warnings-as-errors)
[error] 173-173: deleting a pointer through a type that is not marked 'gsl::owner<>'; consider using a smart pointer instead
(cppcoreguidelines-owning-memory,-warnings-as-errors)
🪛 Ruff (0.14.2)
task.py
526-526: Starting a process with a shell, possible injection detected
(S605)
🔇 Additional comments (17)
mllm/c_api/Object.h (1)
40-41: LGTM!The addition of
kCustomObjectenum value and correspondingv_custom_ptrunion member appropriately extends the C API to support custom object pointers, aligning with the new session management functionality.task.py (1)
515-526: Verify the security model for shell command execution.The use of
os.system(command_str)executes shell commands directly, which static analysis flags as a potential injection risk. While this appears acceptable since commands come from committed YAML config files rather than user input, consider whethersubprocess.run()withshell=Falsewould be more appropriate for better security posture.If untrusted input could ever reach these YAML files, apply this refactor:
+import subprocess + class ShellCommandTask(Task): def __init__(self, config): super().__init__(config) def run(self): logging.info("Generic Shell Command Task Start (re-enabled)...") command_str = self.config.get("command", "") if not command_str: logging.error("No command provided in ShellCommandTask.") return - throw_error_if_failed(os.system(command_str)) + result = subprocess.run(command_str, shell=True, check=False) + throw_error_if_failed(result.returncode)Note: This maintains current behavior while making the shell invocation explicit.
mllm/c_api/Runtime.cpp (3)
79-96: Verify memory management strategy for session wrappers.The raw
newallocation at line 88 creates an owning pointer returned via the C API. Ensure that:
- All code paths properly call
freeSessionto avoid leaks- Double-free scenarios are prevented
- The Go bindings correctly manage the lifecycle
Consider documenting the ownership transfer in a comment:
// Allocate wrapper on heap; ownership transfers to caller // Caller MUST call freeSession to deallocate auto* handle = new MllmSessionWrapper();
130-169: Review response buffer management and completion detection.The function allocates a C-string buffer that must be freed by the caller. Verify:
- All callers properly call
freeResponseString- The
finish_reason == "stop"detection correctly handles all completion scenarios- Consider returning an empty string or specific sentinel instead of
nullptrfor finished streams to distinguish from errors
65-77: LGTM!The service lifecycle functions correctly wrap the underlying C++ service API and provide appropriate error codes.
mllm-cli/go.mod (2)
5-8: LGTM!The addition of
gorilla/websocketandgoogle/uuiddependencies aligns with the new client-server functionality and streaming support introduced in this PR.
3-3: Ignore this review comment; it contains factually incorrect information.Go 1.25 was released on August 12, 2025, so Go 1.25.0 does exist. The review comment's claim that "Go 1.25.0 doesn't exist" is incorrect. The
go.modfile specifies version 1.22.2, which is a valid stable Go version, andgo mod verifyconfirms all modules are valid. No action is required.Likely an incorrect or invalid review comment.
CMakeLists.txt (1)
22-22: Verify enabling C SDK binding by default is appropriate for all build configurations.Changing
MLLM_BUILD_SDK_C_BINDINGfromOFFtoONenables the C API bindings by default for all builds. Ensure:
- All required dependencies for the C bindings are commonly available
- This doesn't break existing build workflows or CI pipelines
- The change is documented in release notes or migration guides
Users who want the previous behavior can override with:
cmake -DMLLM_BUILD_SDK_C_BINDING=OFF ...mllm-cli/cmd/mllm-client/main.go (2)
47-60: LGTM on response handling.The HTTP response handling correctly checks for errors before accessing the response, and properly defers the body close. The status code validation and history rollback on errors are well-implemented.
47-53: Fix potential nil pointer dereference on response.Body.The
defer resp.Body.Close()is called before checking iferr != nil, which meansrespcould beniland cause a panic.Apply this diff:
resp, err := http.DefaultClient.Do(req) if err != nil { log.Printf("ERROR: Request failed: %v", err) history = history[:len(history)-1] continue } defer resp.Body.Close()Move the
deferstatement after the error check:resp, err := http.DefaultClient.Do(req) if err != nil { log.Printf("ERROR: Request failed: %v", err) history = history[:len(history)-1] continue } + defer resp.Body.Close() - defer resp.Body.Close()Actually, looking at the code again, the defer is already after the error check on line 53. Let me re-read... Yes, it's correct. The defer is after the error check. This is fine.
[scratchpad_end] -->Likely an incorrect or invalid review comment.
mllm-cli/pkg/server/server.go (1)
32-42: LGTM!The
StartandShutdownmethods follow standard Go HTTP server patterns. Error handling properly distinguisheshttp.ErrServerClosed(expected during shutdown) from actual failures, and the graceful shutdown implementation correctly propagates the context.mllm-cli/pkg/mllm/service.go (2)
12-21: LGTM!The
Servicestructure and constructor are clean and correctly initialized. The mutex is appropriately included to protect concurrent access to the sessions map.
41-52: LGTM!The
Shutdownmethod correctly locks the mutex, iterates through all sessions to close them, and reinitializes the map. This ensures clean resource cleanup during server shutdown.mllm-cli/cmd/mllm-server/main.go (2)
53-55: LGTM!Starting the HTTP server in a goroutine is the correct approach for graceful shutdown, allowing the main goroutine to wait for shutdown signals.
57-72: LGTM!The graceful shutdown implementation correctly handles SIGINT and SIGTERM signals, shuts down the HTTP server with a reasonable timeout, and ensures the mllm service cleanup runs even if the HTTP shutdown fails. The shutdown ordering (HTTP server → mllm service) is appropriate.
mllm-cli/pkg/api/types.go (1)
19-33: LGTM!The response types (
ResponseDelta,ResponseChoice,OpenAIResponseChunk) are well-structured to represent streaming SSE chunks in the OpenAI format. The nesting and JSON tags are appropriate.mllm-cli/mllm/c.go (1)
8-8: Double‑free safety recap.You correctly clear the finalizer in Close and null the handle after free. However, because the finalizer path doesn’t null the handle (object is dying anyway), it’s crucial that no other goroutine can reach s.cHandle concurrently. The locking change above ensures that.
Please confirm no other packages hold aliases to s.cHandle or call C.freeSession directly.
Also applies to: 13-20, 68-71, 78-81
| requestBody, _ := json.Marshal(apiRequest) | ||
|
|
||
| req, _ := http.NewRequest("POST", serverURL, bytes.NewBuffer(requestBody)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle errors from json.Marshal and http.NewRequest.
Both json.Marshal and http.NewRequest can return errors, but they are silently ignored. This could lead to nil pointer dereferences or invalid requests.
Apply this diff:
- requestBody, _ := json.Marshal(apiRequest)
+ requestBody, err := json.Marshal(apiRequest)
+ if err != nil {
+ log.Printf("ERROR: Failed to marshal request: %v", err)
+ history = history[:len(history)-1]
+ continue
+ }
- req, _ := http.NewRequest("POST", serverURL, bytes.NewBuffer(requestBody))
+ req, err := http.NewRequest("POST", serverURL, bytes.NewBuffer(requestBody))
+ if err != nil {
+ log.Printf("ERROR: Failed to create request: %v", err)
+ history = history[:len(history)-1]
+ continue
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| requestBody, _ := json.Marshal(apiRequest) | |
| req, _ := http.NewRequest("POST", serverURL, bytes.NewBuffer(requestBody)) | |
| requestBody, err := json.Marshal(apiRequest) | |
| if err != nil { | |
| log.Printf("ERROR: Failed to marshal request: %v", err) | |
| history = history[:len(history)-1] | |
| continue | |
| } | |
| req, err := http.NewRequest("POST", serverURL, bytes.NewBuffer(requestBody)) | |
| if err != nil { | |
| log.Printf("ERROR: Failed to create request: %v", err) | |
| history = history[:len(history)-1] | |
| continue | |
| } |
🤖 Prompt for AI Agents
In mllm-cli/cmd/mllm-client/main.go around lines 42 to 44, both json.Marshal and
http.NewRequest ignore returned errors; change the code to capture each error
into a variable, check it immediately, and handle failures (log or print a clear
error message and exit/return). Specifically, replace the blank-ignored error
for json.Marshal with "requestBody, err := json.Marshal(apiRequest)" and if err
!= nil handle it, then create the request with "req, err :=
http.NewRequest(...)" and check err != nil and handle it before using req.
| sessionID := filepath.Base(*modelPath) | ||
| if !session.Insert(sessionID) { | ||
| session.Close() | ||
| log.Fatalf("FATAL: Failed to insert session with ID '%s'", sessionID) | ||
| } | ||
| mllmService.RegisterSession(sessionID, session) | ||
| log.Printf("Session created and registered successfully with ID: %s", sessionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Session ID derivation may cause collisions with multiple models.
Using filepath.Base(*modelPath) as the session ID could cause collisions if multiple models have the same directory name in different paths. For example, /path1/model and /path2/model would both get session ID "model".
Consider one of these approaches:
- Use the full model path or a hash:
- sessionID := filepath.Base(*modelPath)
+ // Use a hash of the full path to avoid collisions
+ sessionID := fmt.Sprintf("%x", md5.Sum([]byte(*modelPath)))[:16]- Allow explicit session ID via flag:
+ sessionID := flag.String("session-id", "", "Session ID (defaults to model directory name)")
flag.Parse()
// ...
- sessionID := filepath.Base(*modelPath)
+ sid := *sessionID
+ if sid == "" {
+ sid = filepath.Base(*modelPath)
+ }The second approach provides flexibility while maintaining the current default behavior.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In mllm-cli/cmd/mllm-server/main.go around lines 45 to 51, deriving the
sessionID with filepath.Base(*modelPath) can collide for models with identical
directory names; change to accept an explicit session ID via a CLI flag and fall
back to a deterministic unique ID when not provided (for example: use the model
path combined with a short hash or a hash-only of the full model path, or the
base name concatenated with the first 8 chars of a SHA256 hex of *modelPath) so
IDs are unique; implement: add a new flag (e.g., --session-id), if set use that
value (validate allowed chars/length), otherwise compute sessionID = baseName +
"-" + shortHash(*modelPath) (or full-hash), then use that sessionID for
session.Insert/registration and update log messages accordingly.
| static void* MllmCAny_get_v_custom_ptr(MllmCAny handle) { | ||
| return handle.v_custom_ptr; | ||
| } | ||
| static MllmCAny MllmCAny_set_v_custom_ptr_null(MllmCAny handle) { | ||
| handle.v_custom_ptr = NULL; | ||
| return handle; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid peeking into C struct internals; add stable C API instead.
Accessing MllmCAny.v_custom_ptr directly is ABI-fragile. If the C struct layout changes, this breaks at compile or (worse) runtime. Prefer an official C helper (e.g., mllmAnyIsValid, mllmAnyNull) exported by the C SDK, or provide a dedicated C API to query/free validity instead of field access from Go.
| type Session struct { | ||
| cHandle C.MllmCAny | ||
| sessionID string | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrent use-after-free risk; guard Session with locks and a closed flag.
Methods can race with Close(), leading to use-after-free of the C handle. Add an RWMutex and a closed flag; serialize Close vs. Insert/SendRequest/PollResponse/SessionID.
Apply this diff:
@@
-import "C"
-import "unsafe"
-import "fmt"
-import "runtime"
+import "C"
+import "unsafe"
+import "fmt"
+import "runtime"
+import "sync"
@@
type Session struct {
cHandle C.MllmCAny
sessionID string
+ mu sync.RWMutex
+ closed bool
}
@@
-func (s *Session) Close() {
- if C.MllmCAny_get_v_custom_ptr(s.cHandle) != nil {
- fmt.Println("[Go Close] Mllm Session manually closed.")
- C.freeSession(s.cHandle)
- s.cHandle = C.MllmCAny_set_v_custom_ptr_null(s.cHandle)
- runtime.SetFinalizer(s, nil)
- }
-}
+func (s *Session) Close() {
+ s.mu.Lock()
+ defer s.mu.Unlock()
+ if s.closed {
+ return
+ }
+ if C.MllmCAny_get_v_custom_ptr(s.cHandle) != nil {
+ fmt.Println("[Go Close] Mllm Session manually closed.")
+ C.freeSession(s.cHandle)
+ s.cHandle = C.MllmCAny_set_v_custom_ptr_null(s.cHandle)
+ }
+ s.closed = true
+ runtime.SetFinalizer(s, nil)
+}
@@
-func (s *Session) Insert(sessionID string) bool {
+func (s *Session) Insert(sessionID string) bool {
+ s.mu.Lock()
+ defer s.mu.Unlock()
+ if s.closed {
+ fmt.Println("[Go Insert] Error: session already closed.")
+ return false
+ }
cSessionID := C.CString(sessionID)
defer C.free(unsafe.Pointer(cSessionID))
result := C.insertSession(cSessionID, s.cHandle)
if isOk(result) {
s.sessionID = sessionID
}
return isOk(result)
}
@@
-func (s *Session) SendRequest(jsonRequest string) bool {
- if s.sessionID == "" {
+func (s *Session) SendRequest(jsonRequest string) bool {
+ s.mu.RLock()
+ defer s.mu.RUnlock()
+ if s.closed {
+ fmt.Println("[Go SendRequest] Error: session is closed.")
+ return false
+ }
+ if s.sessionID == "" {
fmt.Println("[Go SendRequest] Error: sessionID is not set on this session.")
return false
}
cSessionID := C.CString(s.sessionID)
cJsonRequest := C.CString(jsonRequest)
defer C.free(unsafe.Pointer(cSessionID))
defer C.free(unsafe.Pointer(cJsonRequest))
result := C.sendRequest(cSessionID, cJsonRequest)
return isOk(result)
}
@@
-func (s *Session) PollResponse(requestID string) string {
- if requestID == "" {
+func (s *Session) PollResponse(requestID string) string {
+ s.mu.RLock()
+ defer s.mu.RUnlock()
+ if s.closed {
+ fmt.Println("[Go PollResponse] Error: session is closed.")
+ return ""
+ }
+ if requestID == "" {
fmt.Println("[Go PollResponse] Error: requestID cannot be empty.")
return ""
}
cRequestID := C.CString(requestID)
defer C.free(unsafe.Pointer(cRequestID))
cResponse := C.pollResponse(cRequestID)
if cResponse == nil {
return ""
}
defer C.freeResponseString(cResponse)
return C.GoString(cResponse)
}
@@
-func (s *Session) SessionID() string {
- return s.sessionID
-}
+func (s *Session) SessionID() string {
+ s.mu.RLock()
+ defer s.mu.RUnlock()
+ return s.sessionID
+}Also applies to: 76-83, 85-93, 95-107, 109-124, 126-128
🤖 Prompt for AI Agents
In mllm-cli/mllm/c.go around lines 28-31 (and similarly for ranges 76-83, 85-93,
95-107, 109-124, 126-128), the Session struct and its methods are vulnerable to
concurrent use-after-free when Close() runs while other methods access cHandle;
add a sync.RWMutex field (e.g., mu) and a boolean closed flag to Session;
acquire mu.RLock() at the start of Insert/SendRequest/PollResponse/SessionID and
return an error if closed, release RLock before returning; make Close() acquire
mu.Lock(), check and set closed=true, safely free the C handle and set it to nil
under the write lock, then release the lock; ensure all accesses to cHandle and
closed are done under the appropriate lock to serialize Close vs. other
operations.
| func StartService(workerThreads int) bool { | ||
| result := C.startService(C.size_t(workerThreads)) | ||
| return isOk(result) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate workerThreads to prevent size_t underflow on negative input.
Casting a negative int to C.size_t yields a huge value. Guard against workerThreads < 0.
Apply this diff:
func StartService(workerThreads int) bool {
- result := C.startService(C.size_t(workerThreads))
+ if workerThreads < 0 {
+ fmt.Println("[Go StartService] Error: workerThreads cannot be negative.")
+ return false
+ }
+ result := C.startService(C.size_t(workerThreads))
return isOk(result)
}🤖 Prompt for AI Agents
In mllm-cli/mllm/c.go around lines 45 to 48, the StartService function casts
workerThreads directly to C.size_t which underflows for negative ints; validate
that workerThreads is non-negative before casting, returning false (or an error)
immediately if workerThreads < 0, otherwise cast to C.size_t and call
C.startService as before. Ensure the validation is explicit and documented with
a short comment so callers can see negative values are rejected.
| #include <string> | ||
| #include <vector> | ||
| #include <map> | ||
| #include <string.h> // for strncpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated C header with C++ equivalent.
Use <cstring> instead of <string.h> for C++ code.
-#include <string.h> // for strncpy
+#include <cstring> // for strncpy📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <string.h> // for strncpy | |
| #include <cstring> // for strncpy |
🧰 Tools
🪛 Clang (14.0.6)
[error] 11-11: inclusion of deprecated C++ header 'string.h'; consider using 'cstring' instead
(modernize-deprecated-headers,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/c_api/Runtime.cpp around line 11, replace the deprecated C header
include with the C++ equivalent by changing #include <string.h> to #include
<cstring>, and update any direct calls to strncpy to use the std:: namespace
(std::strncpy) or ensure the symbol is available in the current scope.
| //===----------------------------------------------------------------------===// | ||
| // Mllm service functions | ||
| //===----------------------------------------------------------------------===// | ||
| MllmCAny startService(size_t worker_threads); | ||
|
|
||
| MllmCAny stopService(); | ||
|
|
||
| void setLogLevel(int level); | ||
|
|
||
| MllmCAny createQwen3Session(const char* model_path); | ||
|
|
||
| MllmCAny insertSession(const char* session_id, MllmCAny handle); | ||
|
|
||
| MllmCAny freeSession(MllmCAny handle); | ||
|
|
||
| MllmCAny sendRequest(const char* session_id, const char* json_request); | ||
|
|
||
| const char* pollResponse(const char* session_id); | ||
|
|
||
| void freeResponseString(const char* response_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Consider adding documentation comments for the new API surface.
The new session management and service functions lack documentation comments explaining:
- The lifecycle and ownership of sessions
- Memory management rules (e.g., who owns the string returned by
pollResponseand when to callfreeResponseString) - Thread-safety guarantees
- Expected behavior and return values
Add documentation comments in Doxygen or similar format:
//===----------------------------------------------------------------------===//
// Mllm service functions
//===----------------------------------------------------------------------===//
+/**
+ * @brief Starts the MLLM service with the specified number of worker threads.
+ * @param worker_threads Number of worker threads to use.
+ * @return MllmCAny Result status. Check with isOk().
+ */
MllmCAny startService(size_t worker_threads);
+/**
+ * @brief Stops the MLLM service and cleans up resources.
+ * @return MllmCAny Result status. Check with isOk().
+ */
MllmCAny stopService();
+/**
+ * @brief Sets the log level for MLLM logging.
+ * @param level Log level (e.g., 0=OFF, 1=ERROR, 2=INFO, 3=DEBUG).
+ */
void setLogLevel(int level);
+/**
+ * @brief Creates a new Qwen3 session from the specified model path.
+ * @param model_path Path to the model directory.
+ * @return MllmCAny Session handle. Check with isOk(). Must be freed with freeSession().
+ */
MllmCAny createQwen3Session(const char* model_path);
+/**
+ * @brief Inserts a session into the service registry with the given ID.
+ * @param session_id Unique identifier for the session.
+ * @param handle Session handle from createQwen3Session().
+ * @return MllmCAny Result status. Check with isOk().
+ */
MllmCAny insertSession(const char* session_id, MllmCAny handle);
+/**
+ * @brief Frees a session handle and its associated resources.
+ * @param handle Session handle to free.
+ * @return MllmCAny Result status. Check with isOk().
+ */
MllmCAny freeSession(MllmCAny handle);
+/**
+ * @brief Sends a JSON request to the specified session.
+ * @param session_id Session identifier.
+ * @param json_request JSON-formatted request string.
+ * @return MllmCAny Result status. Check with isOk().
+ */
MllmCAny sendRequest(const char* session_id, const char* json_request);
+/**
+ * @brief Polls for a response from the specified session.
+ * @param session_id Session identifier.
+ * @return const char* Response string or NULL if no response available.
+ * Caller must free the returned string using freeResponseString().
+ */
const char* pollResponse(const char* session_id);
+/**
+ * @brief Frees a response string returned by pollResponse().
+ * @param response_str Response string to free. Can be NULL.
+ */
void freeResponseString(const char* response_str);🤖 Prompt for AI Agents
In mllm/c_api/Runtime.h around lines 34 to 53, the new public C API declarations
lack documentation: add Doxygen-style comments above each function and a brief
header for the section that describe each function's purpose, parameter
semantics, ownership/lifetime rules (who allocates/frees session handles and
response strings), thread-safety guarantees, expected return values and error
conditions, and example usage (e.g., call startService before
createQwen3Session, call freeResponseString after consuming pollResponse). For
pointer-returning APIs (pollResponse, createQwen3Session,
insertSession/freeSession), explicitly state ownership and when to call
freeResponseString or freeSession; for functions returning MllmCAny explain what
the opaque type represents and how to check for failure; and document
concurrency expectations (reentrant, per-session single-threaded, or protected
by caller). Ensure comments are concise, follow Doxygen format (/** ... */) and
placed immediately above each declaration.
| export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin" | ||
| export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace hardcoded absolute paths with environment variables or relative paths.
Same portability issue present in the other Android build tasks. The hardcoded path /root/zty_workspace/mllm_zty should be parameterized.
- export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin"
- export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty"
+ export CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin"
+ export CGO_CFLAGS="-I${MLLM_ROOT}"🤖 Prompt for AI Agents
In tasks/build_android_go_dialog_test.yaml around lines 18 to 19, the
CGO_LDFLAGS and CGO_CFLAGS use a hardcoded absolute path
(/root/zty_workspace/mllm_zty); replace these with either a provided environment
variable (e.g., $WORKSPACE or $PROJECT_ROOT) or a relative path (e.g.,
${CI_PROJECT_DIR} or ./) so the build is portable across environments — update
the exports to reference that variable/relative path and document or ensure the
variable is set in the pipeline.
| export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin" | ||
| export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace hardcoded absolute paths with environment variables or relative paths.
Same issue as in build_android_mllm_server.yaml: the hardcoded path /root/zty_workspace/mllm_zty will break builds on other machines.
- export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin"
- export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty"
+ export CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin"
+ export CGO_CFLAGS="-I${MLLM_ROOT}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin" | |
| export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty" | |
| export CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin" | |
| export CGO_CFLAGS="-I${MLLM_ROOT}" |
🤖 Prompt for AI Agents
In tasks/build_android_mllm_client.yaml around lines 16-17, the CGO_LDFLAGS and
CGO_CFLAGS use a hardcoded absolute path (/root/zty_workspace/mllm_zty); replace
those with either a configurable environment variable (e.g. ${WORKSPACE_ROOT} or
${GITHUB_WORKSPACE}) or a relative path from the repo root (e.g.
${PWD}/build-android-arm64-v8a/bin and ${PWD}) so the CI/build works on other
machines; ensure the chosen env var is set earlier in the workflow or document
it, and update the two export lines to reference that variable instead of the
hardcoded path.
| export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin" | ||
| export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace hardcoded absolute paths with environment variables or relative paths.
The hardcoded absolute path /root/zty_workspace/mllm_zty will break builds on other machines. Consider using ${PROJECT_ROOT} or an environment variable like ${MLLM_ROOT}.
Apply this pattern:
- export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin"
- export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty"
+ export CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin"
+ export CGO_CFLAGS="-I${MLLM_ROOT}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export CGO_LDFLAGS="-L/root/zty_workspace/mllm_zty/build-android-arm64-v8a/bin" | |
| export CGO_CFLAGS="-I/root/zty_workspace/mllm_zty" | |
| export CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin" | |
| export CGO_CFLAGS="-I${MLLM_ROOT}" |
🤖 Prompt for AI Agents
In tasks/build_android_mllm_server.yaml around lines 16-17, the CGO_LDFLAGS and
CGO_CFLAGS use a hardcoded absolute path (/root/zty_workspace/mllm_zty) which
will break on other machines; change them to reference an environment variable
(e.g. ${MLLM_ROOT} or ${PROJECT_ROOT}) or a relative path, e.g. export
MLLM_ROOT=${MLLM_ROOT:-$(pwd)} at the top of the script and then use export
CGO_LDFLAGS="-L${MLLM_ROOT}/build-android-arm64-v8a/bin" and export
CGO_CFLAGS="-I${MLLM_ROOT}" so builds work across environments and CI.
|
I've just rebased this branch onto the latest upstream/v2. It's ready for review again. |
|
LGTM |
This PR implements core functionality for
mllm-cli, including Go bindings for the C/C++ backend and an OpenAI API-compatible client-server architecture for interacting with the Qwen3 model. Corresponding documentation for the CLI client and its services has also been added and updated.