Skip to content

Conversation

@davidkoski
Copy link
Collaborator

@davidkoski davidkoski commented Dec 18, 2025

Note that this requires changes in mlx-swift (so likely a new tag there):

Proposed changes

Please include a description of the problem or feature this PR is addressing. If there is a corresponding issue, include the issue #.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

dims: headDim, base: config.ropeTheta, traditional: false,
scalingConfig: config.ropeScaling,
maxPositionEmbeddings: config.maxPositionEmbeddings)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking up changes post initial port: ml-explore/mlx-lm@714157b...main

return suScaledRope(x, offset: offset)
}
return x
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
let (cachedKeys, cachedValues) = cache.update(keys: keys, values: values)
// TODO dkoski
// print("\(cachedKeys.shape) \(cachedValues.shape) \(queries.shape), \(mask.masks?[0].shape ?? [])")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP debug stuff :-)

_ action: @Sendable (isolated ModelContainer) async throws -> sending R
) async rethrows -> sending R {
try await action(self)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DePasqualeOrg FYI, trying some different things out re your recent cleanups around Sendable and thread safety. I have some tests that repro some threading issues (based on the LLMBasic example I made).

import XCTest

/// Tests for the streamlined API using real models
public class ChatSessionTests: XCTestCase {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DePasqualeOrg FYI moved this into an IntegrationTests directory -- I am not sure this should run on CI as these are rather large, but I think the tests are valuable to run locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I thought about that when I modified this test, but I didn't realize that it could be excluded from CI.

let result = try await session.respond(to: "What is 2+2? Reply with just the number.")
print("One-shot result:", result)
XCTAssertTrue(result.contains("4") || result.lowercased().contains("four"))
func testChatSessionAsyncInterrupt() async throws {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DePasqualeOrg FYI an example of some concurrency issues related to the issues you were working on.

This triggers a variety of crashes:

and a couple others without issues where the streaming response is still running for a short time after the loop terminates early and we are doing concurrent modification of the KVCache.

I will use this to test actual fixes.

Self.llmContainer, instructions: "You are a helpful assistant. Keep responses brief.")
@MainActor
func testViewModel() async throws {
let model = ChatModel(model: model())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one simulates the activity from LLMBasic which also causes thread safety issues.

- ml-explore/mlx-swift-examples#454

- fixes #27
- move ChatSession integration tests into new test target so we can more easily control when it runs
- make a ChatSession _unit_ (more or less) test
- fix Sendable / thread safety issues uncovered by LLMBasic

- collect TestTokenizer and friends in its own file.  fix warnings in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] gemma3text crashes if the attention mask is used

3 participants