feat: Add mllm-cli support for Qwen3 and update docs #493

yuerqiqi · 2025-10-29T11:00:38Z

This PR implements core functionality for mllm-cli, including Go bindings for the C/C++ backend and an OpenAI API-compatible client-server architecture for interacting with the Qwen3 model. Corresponding documentation for the CLI client and its services has also been added and updated.

coderabbitai · 2025-10-29T11:00:53Z

Walkthrough

This pull request extends the MLLM framework with a complete Go-based CLI client-server infrastructure, new C API session management capabilities, Go bindings to the C SDK, and Android build configurations. It enables the C SDK binding by default and introduces HTTP-based multi-turn chat functionality with streaming responses.

Changes

Cohort / File(s)	Summary
C API Extensions `mllm/c_api/Object.h`, `mllm/c_api/Runtime.h`, `mllm/c_api/Runtime.cpp`	Introduces new C API surface for session management (startService, stopService, createQwen3Session, insertSession, freeSession) and request/response handling (sendRequest, pollResponse). Adds kCustomObject enum value and v_custom_ptr union member to support custom session objects. Implements session lifecycle wrapper and asynchronous response polling.
Build Configuration `CMakeLists.txt`	Enables MLLM C SDK binding by default by changing MLLM_BUILD_SDK_C_BINDING option from OFF to ON.
Go SDK Bindings `mllm-cli/mllm/c.go`	Exposes Go API wrapping C SDK functions with Session type, lifecycle management (StartService, StopService, NewSession), and request-response handling (SendRequest, PollResponse). Includes finalizers for automatic resource cleanup and C helper wrappers for custom pointer management.
Go Service Layer `mllm-cli/pkg/mllm/service.go`	Introduces thread-safe Service component for session registration and retrieval with mutex synchronization. Provides NewService, RegisterSession, GetSession, and graceful Shutdown.
Go HTTP Server `mllm-cli/pkg/server/server.go`, `mllm-cli/pkg/server/handlers.go`	Implements HTTP server wrapper with /v1/chat/completions endpoint. Includes chatCompletionsHandler that decodes OpenAI-like requests, manages sessions, polls responses, and streams server-sent events (SSE) with UUID request tracking.
Go API Types `mllm-cli/pkg/api/types.go`	Defines OpenAI-compatible request/response types: RequestMessage, OpenAIRequest (with session ID support), ResponseDelta, ResponseChoice, and OpenAIResponseChunk for streaming SSE protocol.
Go CLI Client `mllm-cli/cmd/mllm-client/main.go`	Interactive CLI client maintaining multi-turn chat history with streaming support. Communicates with server via HTTP, handles X-Session-ID header updates, supports /exit and /quit commands, and progressively prints streamed assistant responses.
Go CLI Server `mllm-cli/cmd/mllm-server/main.go`	Server entrypoint accepting required --model-path flag. Initializes MLLM service, creates and registers session, instantiates HTTP server on port 8080, and performs graceful shutdown on SIGINT/SIGTERM.
Go Dependencies `mllm-cli/go.mod`	Updates Go toolchain to 1.22.2, restructures dependencies under require block with bubbles v0.21.0 and gorilla/websocket v1.5.3, and adds indirect google/uuid v1.6.0.
Build Tasks & Configuration `task.py`, `tasks/build_android_debug.yaml`, `tasks/build_android_go_dialog_test.yaml`, `tasks/build_android_mllm_client.yaml`, `tasks/build_android_mllm_server.yaml`	Introduces ShellCommandTask class for executing shell commands via task framework. Adds Android NDK cross-compilation configurations for debug builds and Go binaries (dialog test, WebSocket client, web server) targeting arm64 with mobile tags and CGO linking.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Client as CLI Client
    participant Server as HTTP Server
    participant Service as MLLM Service
    participant CApi as C API Layer

    User->>Client: /exit or chat input
    Client->>Client: Validate input
    alt Client exit command
        Client->>Client: Clear & exit
    else User message
        Client->>Client: Add to history
        Client->>Server: POST /v1/chat/completions<br/>(messages, session_id, stream=true)
        activate Server
        Server->>Service: GetSession(session_id)
        activate Service
        Service-->>Server: *Session
        deactivate Service
        Server->>Server: Generate request ID<br/>(UUID if missing)
        Server->>CApi: SendRequest(session_id, json)
        activate CApi
        CApi-->>Server: Request ID
        deactivate CApi
        
        Server->>Client: Set SSE headers<br/>Start streaming
        
        loop Poll until completion
            Server->>CApi: PollResponse(session_id)
            CApi-->>Server: JSON response chunk
            Server->>Server: Parse chunk<br/>Check finish_reason
            Server->>Client: Stream data line<br/>(SSE format)
            Client->>Client: Accumulate content
            Client->>User: Print streamed text
            
            alt Finish reason = "stop"
                Server->>Server: Mark complete
            else Empty response
                Server->>Server: Idle, retry
            end
        end
        
        Server->>Client: [DONE]
        deactivate Server
        Client->>Client: Update history<br/>with full response
    end

sequenceDiagram
    participant main as main()
    participant Service as StartService()
    participant Session as NewSession()
    participant CApi as C API Layer
    participant HTTP as HTTP Server

    main->>main: Parse --model-path flag
    main->>main: Check required args
    
    main->>Service: StartService(workers)
    activate Service
    Service->>CApi: startService()
    CApi-->>Service: MllmCAny result
    Service-->>main: bool (success)
    deactivate Service
    
    main->>Session: NewSession(model_path)
    activate Session
    Session->>CApi: createQwen3Session(model_path)
    CApi-->>Session: MllmCAny session handle
    Session->>Session: Set finalizer for cleanup
    Session-->>main: *Session, error
    deactivate Session
    
    main->>main: Insert(session_id)
    main->>main: RegisterSession(id, session)
    
    main->>HTTP: NewServer(":8080", service)
    activate HTTP
    HTTP->>HTTP: Register /v1/chat/completions
    HTTP-->>main: *Server
    deactivate HTTP
    
    main->>HTTP: Start()
    HTTP->>HTTP: Listen & accept connections
    
    main->>main: Wait for SIGINT/SIGTERM
    
    main->>HTTP: Shutdown(ctx)
    HTTP->>HTTP: Graceful server close
    
    main->>Service: Shutdown()
    Service->>Session: Close()
    Session->>CApi: freeSession()
    Service->>Service: Clear sessions map

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

C API layer additions require careful review of memory management, session wrapper lifecycle, and new public function signatures (startService, stopService, session management, request/response handling)
Go/C FFI bindings need validation of finalizer correctness, cHandle management, and error translation patterns
HTTP server and streaming handler demand attention to SSE protocol correctness, session resolution logic, and UUID generation/tracking
Multi-turn client state management should be verified for correct session persistence and error recovery (message rollback on failure)
Thread-safety in Service requires mutex usage validation
Build infrastructure (multiple Android configurations) involves verifying cross-compilation flags and CGO setup across different targets

Areas requiring extra attention:

Memory leaks in C session lifecycle (allocation via createQwen3Session vs. deallocation via freeSession/finalizer)
Go finalizer behavior with shared C pointers in concurrent scenarios
SSE streaming completion detection and [DONE] marker correctness
Session ID generation and header synchronization between client and server
Error paths in session lookup and request forwarding

Poem

🐰 Hop and bind, a bridge we weave,
From Go to C, we interleave,
Sessions stream on SSE's flow,
Multi-turn chats steal the show!
Android builds now cross compile,
MLLM's CLI with style! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: Add mllm-cli support for Qwen3 and update docs' is clearly related to the main changes, which include extensive MLLM CLI infrastructure (client/server), Qwen3 session support in C API, and build configurations. It captures the primary intent of adding CLI support and Qwen3 integration.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chenghuaWang · 2025-10-29T11:14:11Z

@coderabbitai review

coderabbitai · 2025-10-29T11:14:39Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (23)

tasks/build_android_mllm_server.yaml (1)
11-13: Consider parameterizing the NDK path.

The hardcoded NDK path /opt/ndk/android-ndk-r28b reduces portability. While ANDROID_NDK_HOME is already exported, consider allowing it to be overridden from the environment if it's already set.
-      export ANDROID_NDK_HOME=/opt/ndk/android-ndk-r28b
+      export ANDROID_NDK_HOME=${ANDROID_NDK_HOME:-/opt/ndk/android-ndk-r28b}
tasks/build_android_mllm_client.yaml (1)

11-13: Consider parameterizing the NDK path.

Same recommendation as the server build: allow ANDROID_NDK_HOME to be overridden from the environment.

tasks/build_android_go_dialog_test.yaml (1)

12-14: Consider parameterizing the NDK path.

Allow ANDROID_NDK_HOME to be overridden from the environment for better portability.
mllm/c_api/Runtime.cpp (3)
35-41: Add braces around single-statement if blocks.

For consistency and safety, wrap single-statement if blocks in braces.
 int32_t isOk(MllmCAny ret) {
-  if (ret.type_id == kRetCode && ret.v_return_code == 0)
+  if (ret.type_id == kRetCode && ret.v_return_code == 0) {
       return true;
-  if (ret.type_id == kCustomObject && ret.v_custom_ptr != nullptr)
+  }
+  if (ret.type_id == kCustomObject && ret.v_custom_ptr != nullptr) {
       return true;
+  }
   return false;
 }
165-166: Consider safer string copy alternatives.

strncpy doesn't guarantee null-termination if the source is too long. Since response.length() + 1 bytes are allocated, consider using strcpy or memcpy followed by explicit null-termination.
-    char* c_response = new char[response.length() + 1];
-    strncpy(c_response, response.c_str(), response.length() + 1);
+    char* c_response = new char[response.length() + 1];
+    std::memcpy(c_response, response.c_str(), response.length());
+    c_response[response.length()] = '\0';
172-173: Simplify null pointer check before delete.

The null check is unnecessary since delete nullptr is safe in C++.
 void freeResponseString(const char* response_str) {
-    if (response_str != nullptr) {
-        delete[] response_str;
-    }
+    delete[] response_str;
 }
tasks/build_android_debug.yaml (1)
12-12: Consider using a relative or configurable install prefix.

The hard-coded /root/mllm-install-android-arm64-v8a path assumes a specific user directory and won't work in environments where builds run under different users or in CI systems with different home directories.

Consider using a relative path or an environment variable:
-        - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
+        - "-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX:-./install-android-arm64-v8a}"
mllm-cli/pkg/server/handlers.go (4)
23-27: Consider logging the decode error for debugging.

When JSON decoding fails, the actual error is not logged, making it harder to diagnose malformed requests during troubleshooting.

Apply this diff:
 		var requestPayload map[string]interface{}
 		if err := json.NewDecoder(r.Body).Decode(&requestPayload); err != nil {
+			log.Printf("ERROR: Failed to decode request body: %v", err)
 			http.Error(w, "Invalid request body", http.StatusBadRequest)
 			return
 		}
52-55: Consider logging the reason for SendRequest failure.

When session.SendRequest returns false, no additional context is provided about why the request failed. This makes debugging difficult when requests are rejected by the underlying session.

If the underlying SendRequest implementation can provide error details, consider enhancing the API to return an error instead of just a boolean. Otherwise, add internal logging within SendRequest to capture failure reasons.

60-60: Add safety check for Flusher type assertion.

The type assertion w.(http.Flusher) doesn't check the ok return value. While most HTTP ResponseWriters support Flusher, it's safer to verify.

Apply this diff:
-		flusher, _ := w.(http.Flusher)
+		flusher, ok := w.(http.Flusher)
+		if !ok {
+			http.Error(w, "Streaming not supported", http.StatusInternalServerError)
+			return
+		}
63-90: LGTM with one consideration on polling interval.

The streaming logic correctly handles client disconnects, end markers, and empty responses. The implementation properly filters out empty deltas before sending to avoid unnecessary SSE messages.

One consideration: the 10ms polling interval (line 84) creates a relatively tight loop. Depending on the underlying model's response generation speed, you might consider making this configurable or slightly increasing it to reduce CPU usage during idle periods.

If desired, consider making the polling interval configurable:
+const responsePollingInterval = 50 * time.Millisecond
+
 		for {
 			// ... existing code ...
 			} else {
-				time.Sleep(10 * time.Millisecond)
+				time.Sleep(responsePollingInterval)
 			}
 		}
mllm-cli/cmd/mllm-client/main.go (3)
19-19: Consider making the server URL configurable.

The server URL is hard-coded to http://localhost:8080/v1/chat/completions, which limits flexibility for connecting to different servers or ports.

Make the URL configurable via a command-line flag:
+import "flag"
+
 func main() {
-	serverURL := "http://localhost:8080/v1/chat/completions"
+	serverURL := flag.String("server-url", "http://localhost:8080/v1/chat/completions", "Server URL for chat completions")
+	flag.Parse()
 	var history []api.RequestMessage
Then use *serverURL throughout the code.

30-30: Handle potential error from ReadString.

The error from reader.ReadString('\n') is silently ignored, which could mask I/O issues.

Apply this diff:
-		userInput, _ := reader.ReadString('\n')
+		userInput, err := reader.ReadString('\n')
+		if err != nil {
+			log.Printf("ERROR: Failed to read input: %v", err)
+			continue
+		}
76-82: Consider logging JSON unmarshal errors.

When json.Unmarshal fails to parse a chunk, the error is silently ignored. This could make debugging issues with malformed server responses more difficult.

Apply this diff:
 				var chunk api.OpenAIResponseChunk
-				if json.Unmarshal([]byte(jsonData), &chunk) == nil && len(chunk.Choices) > 0 {
+				if err := json.Unmarshal([]byte(jsonData), &chunk); err != nil {
+					log.Printf("WARN: Failed to unmarshal chunk: %v", err)
+				} else if len(chunk.Choices) > 0 {
 					content := chunk.Choices[0].Delta.Content
 					fmt.Print(content)
 					fullResponse.WriteString(content)
 				}
mllm-cli/pkg/server/server.go (1)
17-30: Consider adding nil check for mllmService parameter.

The NewServer constructor doesn't validate that mllmService is non-nil, which could lead to a panic when the handler tries to access it.

Add a nil check:
 func NewServer(addr string, mllmService *mllm.Service) *Server {
+	if mllmService == nil {
+		log.Fatal("FATAL: mllmService cannot be nil")
+	}
 	mux := http.NewServeMux()
mllm-cli/pkg/mllm/service.go (1)
23-27: Consider logging when overwriting an existing session.

The RegisterSession method allows overwriting an existing session without warning. This could mask bugs where the same session ID is registered multiple times.

Add logging when overwriting:
 func (s *Service) RegisterSession(id string, session *mllm.Session) {
 	s.mutex.Lock()
 	defer s.mutex.Unlock()
+	if _, exists := s.sessions[id]; exists {
+		log.Printf("WARN: Overwriting existing session with ID: %s", id)
+	}
 	s.sessions[id] = session
 }
mllm/c_api/Runtime.h (1)

51-53: Clarify memory management contract for pollResponse.

The pairing of pollResponse() returning a const char* with a separate freeResponseString() function creates potential for memory leaks if callers forget to free. Consider if the API could be made safer.

Options to consider:

Document the memory contract very clearly (which the documentation comment above addresses).

Consider returning a status code with an out-parameter for the string, making memory ownership more explicit.

Use a buffer provided by the caller to avoid heap allocation.

For now, ensure that all Go wrapper code and examples demonstrate proper usage with defer to free responses immediately after use.
mllm-cli/cmd/mllm-server/main.go (1)
30-32: Consider making log level and worker threads configurable.

The log level (2) and worker thread count (4) are hard-coded. Making these configurable via flags would improve flexibility for different deployment environments.

Add command-line flags:
 func main() {
 	modelPath := flag.String("model-path", "", "Path to the MLLM model directory.")
+	logLevel := flag.Int("log-level", 2, "Log level (0=OFF, 1=ERROR, 2=INFO, 3=DEBUG)")
+	workerThreads := flag.Int("workers", 4, "Number of worker threads")
 	flag.Parse()
 
 	if *modelPath == "" {
 		log.Fatal("FATAL: --model-path argument is required.")
 	}
 
 	if !mllm.InitializeContext() {
 		log.Fatal("FATAL: InitializeContext failed!")
 	}
-	mllm.SetLogLevel(2)
-	if !mllm.StartService(4) {
+	mllm.SetLogLevel(*logLevel)
+	if !mllm.StartService(*workerThreads) {
 		log.Fatal("FATAL: StartService failed!")
 	}
mllm-cli/pkg/api/types.go (1)
14-16: Consider English comments and clarify EnableThinking vs Thinking fields.

Two observations:

The comment on line 15 is in Chinese, which may not be accessible to all contributors. Consider using English for consistency.

The EnableThinking and Thinking fields appear to serve similar purposes. Having both could lead to confusion about which field clients should use.

Clarify the distinction or consolidate:
 type OpenAIRequest struct {
 	Model          string           `json:"model"`
 	Messages       []RequestMessage `json:"messages"`
 	Stream         bool             `json:"stream"`
-	EnableThinking bool             `json:"enable_thinking,omitempty"` 
-	Thinking       bool             `json:"thinking,omitempty"`       // <-- 新增此行，用于接收客户端可能发送的 "thinking": true
+	// EnableThinking enables the model's reasoning/thinking mode
+	EnableThinking bool             `json:"enable_thinking,omitempty"` 
 	SessionID      string           `json:"session_id,omitempty"`     
 }
If both fields are genuinely needed, document why they're distinct and when each should be used.
mllm-cli/mllm/c.go (4)
8-8: Link portability: consider rpath/build tags to avoid runtime loader issues.

Linking against -lMllmSdkC -lMllmRT -lMllmCPUBackend assumes system search paths are configured. For CLI distribution, consider:

Adding an rpath to the binary (e.g., -Wl,-rpath,$ORIGIN or a project-specific lib dir).

Using build tags/conditional LDFLAGS per OS/arch if backends differ.

Documenting LD_LIBRARY_PATH/DYLD_LIBRARY_PATH requirements.

59-74: Finalizer is a safety net, not a guarantee; consider reducing side effects and improving error context.

Avoid relying on the finalizer for correctness; explicit Close should be the primary path.

Printing in finalizers can surprise users; prefer a debug logger or remove the print.

The error message should include modelPath and be English/localized consistently; include C-side error if available.

Apply this diff:
-    handle := C.createQwen3Session(cModelPath)
-    if !isOk(handle) {
-        return nil, fmt.Errorf("底层C API createQwen3Session 失败")
-    }
+    handle := C.createQwen3Session(cModelPath)
+    if !isOk(handle) {
+        return nil, fmt.Errorf("createQwen3Session failed for model %q", modelPath)
+    }
@@
-    runtime.SetFinalizer(s, func(s *Session) {
-        fmt.Println("[Go Finalizer] Mllm Session automatically released.") 
-        C.freeSession(s.cHandle)
-    })
+    runtime.SetFinalizer(s, func(s *Session) {
+        // Finalizer is best-effort; prefer explicit Close().
+        C.freeSession(s.cHandle)
+    })
55-57: Surface SetLogLevel errors or document valid range.

If the C API validates levels, consider returning bool/error or documenting accepted range to help callers detect misconfiguration.

Would you like me to add a thin wrapper returning bool and adjust callers?

95-99: Optional: validate jsonRequest early.

If empty strings are invalid, reject early for clearer errors; if JSON must be well-formed, consider basic validation before crossing the cgo boundary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8eec9a and e4b2497.

📒 Files selected for processing (17)

CMakeLists.txt (1 hunks)
mllm-cli/cmd/mllm-client/main.go (1 hunks)
mllm-cli/cmd/mllm-server/main.go (1 hunks)
mllm-cli/go.mod (2 hunks)
mllm-cli/mllm/c.go (2 hunks)
mllm-cli/pkg/api/types.go (1 hunks)
mllm-cli/pkg/mllm/service.go (1 hunks)
mllm-cli/pkg/server/handlers.go (1 hunks)
mllm-cli/pkg/server/server.go (1 hunks)
mllm/c_api/Object.h (1 hunks)
mllm/c_api/Runtime.cpp (2 hunks)
mllm/c_api/Runtime.h (1 hunks)
task.py (2 hunks)
tasks/build_android_debug.yaml (1 hunks)
tasks/build_android_go_dialog_test.yaml (1 hunks)
tasks/build_android_mllm_client.yaml (1 hunks)
tasks/build_android_mllm_server.yaml (1 hunks)

🧰 Additional context used

🪛 Clang (14.0.6)

mllm/c_api/Runtime.cpp

[error] 11-11: inclusion of deprecated C++ header 'string.h'; consider using 'cstring' instead

(modernize-deprecated-headers,-warnings-as-errors)

[error] 36-36: statement should be inside braces

(google-readability-braces-around-statements,readability-braces-around-statements,-warnings-as-errors)

[error] 37-37: implicit conversion bool -> 'int'

(readability-implicit-bool-conversion,-warnings-as-errors)

[error] 38-38: statement should be inside braces

(google-readability-braces-around-statements,readability-braces-around-statements,-warnings-as-errors)

[error] 39-39: implicit conversion bool -> 'int'

(readability-implicit-bool-conversion,-warnings-as-errors)

[error] 40-40: implicit conversion bool -> 'int'

(readability-implicit-bool-conversion,-warnings-as-errors)

[error] 66-66: variable 'worker_threads' is not initialized