-
Notifications
You must be signed in to change notification settings - Fork 44
support for LLMBasic (mlx-swift-examples) #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| dims: headDim, base: config.ropeTheta, traditional: false, | ||
| scalingConfig: config.ropeScaling, | ||
| maxPositionEmbeddings: config.maxPositionEmbeddings) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picking up changes post initial port: ml-explore/mlx-lm@714157b...main
| return suScaledRope(x, offset: offset) | ||
| } | ||
| return x | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } else { | ||
| let (cachedKeys, cachedValues) = cache.update(keys: keys, values: values) | ||
| // TODO dkoski | ||
| // print("\(cachedKeys.shape) \(cachedValues.shape) \(queries.shape), \(mask.masks?[0].shape ?? [])") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP debug stuff :-)
| _ action: @Sendable (isolated ModelContainer) async throws -> sending R | ||
| ) async rethrows -> sending R { | ||
| try await action(self) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DePasqualeOrg FYI, trying some different things out re your recent cleanups around Sendable and thread safety. I have some tests that repro some threading issues (based on the LLMBasic example I made).
| import XCTest | ||
|
|
||
| /// Tests for the streamlined API using real models | ||
| public class ChatSessionTests: XCTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DePasqualeOrg FYI moved this into an IntegrationTests directory -- I am not sure this should run on CI as these are rather large, but I think the tests are valuable to run locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I thought about that when I modified this test, but I didn't realize that it could be excluded from CI.
| let result = try await session.respond(to: "What is 2+2? Reply with just the number.") | ||
| print("One-shot result:", result) | ||
| XCTAssertTrue(result.contains("4") || result.lowercased().contains("four")) | ||
| func testChatSessionAsyncInterrupt() async throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DePasqualeOrg FYI an example of some concurrency issues related to the issues you were working on.
This triggers a variety of crashes:
- thread safety -- hold lock while calling stream sync mlx-swift#323
- [BUG] gemma3text crashes if the attention mask is used #27
and a couple others without issues where the streaming response is still running for a short time after the loop terminates early and we are doing concurrent modification of the KVCache.
I will use this to test actual fixes.
| Self.llmContainer, instructions: "You are a helpful assistant. Keep responses brief.") | ||
| @MainActor | ||
| func testViewModel() async throws { | ||
| let model = ChatModel(model: model()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one simulates the activity from LLMBasic which also causes thread safety issues.
8a33925 to
1d94ca6
Compare
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests
1d94ca6 to
9063912
Compare
add a minimal LLM chat example mlx-swift-examples#454
fixes [BUG] gemma3text crashes if the attention mask is used #27
move ChatSession integration tests into new test target so we can more easily control when it runs
make a ChatSession unit (more or less) test
fix Sendable / thread safety issues uncovered by LLMBasic
Note that this requires changes in mlx-swift (so likely a new tag there):
Proposed changes
Please include a description of the problem or feature this PR is addressing. If there is a corresponding issue, include the issue #.
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes