-
Notifications
You must be signed in to change notification settings - Fork 706
feat: add LoRA common APIs and implementation for lora management #4464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughIntroduces LoRA (Low-Rank Adaptation) management infrastructure combining Rust core with Python wrapper. Rust implementation provides caching, multi-source downloading, and a pluggable source trait supporting local filesystem and S3 backends. Python layer wraps Rust components with custom source registration points. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Key areas requiring attention:
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (8)
components/src/dynamo/common/lora/lora/manager.py (1)
97-129: Alignis_cachedsemantics with Rust cache validation (or document current behavior)
is_cachedcurrently does:cache_key = self._uri_to_cache_key(lora_uri) return self.cache.is_cached(cache_key)Given the Rust downloader checks both existence and validity:
if self.cache.is_cached(&cache_key) && self.cache.validate_cached(&cache_key)? { ... }
is_cached()in Python will returnTruefor directories that exist but are missing required files, while a later download might re‑fetch the LoRA becausevalidate_cachedfails. That divergence can be surprising to callers.Consider either:
- Mirroring the Rust semantics by incorporating validation (if exposed in
PyLoRACache), or- Keeping the cheaper “directory exists” check but documenting that
is_cacheddoes not guarantee a fully valid cache entry.
_uri_to_cache_keyitself looks consistent with the tests (s3/gs/http/file) and the Rust logic, so no issues there.lib/llm/Cargo.toml (1)
90-92: Confirmobject_storefeature set matches intended LoRA backendsThe
object_storedependency with["aws", "gcp", "azure", "http"]enables all major cloud backends plus generic HTTP. That’s convenient but increases compile time and dependency surface.If you only plan to support a subset (e.g., S3 + HTTP initially), you may want to trim unused features to keep the binary smaller and dependency tree tighter. Otherwise this is fine as a starting point.
components/src/dynamo/common/lora/lora/test_manager.py (1)
16-149: Solid test coverage; minor robustness and lint nitsOverall this is a nice, focused test suite. A few small points:
MockLoRASource.download/existsdon’t uselora_uri, which Ruff flags (ARG002). If you care about a clean lint run, you can simply acknowledge the parameter:- async def download(self, lora_uri: str, dest_path: Path) -> Path: - self.download_called = True + async def download(self, lora_uri: str, dest_path: Path) -> Path: + _ = lora_uri + self.download_called = True ... - async def exists(self, lora_uri: str) -> bool: - return self.should_exist + async def exists(self, lora_uri: str) -> bool: + _ = lora_uri + return self.should_exist
test_manager_custom_source_not_found: currently only checks that"status"is present, so it doesn’t really constrain behavior. Once the manager’s behavior aroundexists()is finalized, you may want to assert on a specificstatusvalue (e.g.,"error") or on the presence of an error message to catch regressions.test_manager_is_cached: by just assertingisinstance(is_cached, bool)it only verifies that the call doesn’t crash. You could consider asserting the expected boolean once you lock in the cache-key semantics (s3 URI →"path/to/cached-lora"), to ensure the Python wrapper stays aligned with the Rust cache behavior.All of these are optional polish; functionally the tests exercise the key flows.
lib/llm/src/lora/cache.rs (2)
7-59: LoRACache implementation is correct; consider API shape and portabilityThe cache logic looks sound:
get_cache_pathis a simple join,is_cachedchecks existence, andvalidate_cachedenforcesadapter_config.jsonplus at least one known weight file.Two minor considerations:
from_envcurrently returnsResult<Self>but cannot fail (it always falls back toDEFAULT_LORA_CACHE_DIR). You could simplify the API to returnSelfdirectly, or, if you expect to add fallible behavior later, leave it as-is but add a comment that it’s intentionally infallible today.DEFAULT_LORA_CACHE_DIRis hard-coded to"/tmp/dynamo_loras", which is fine for Linux-like environments but not ideal on Windows. If this crate is ever used cross‑platform, consider deriving a per‑platform temp/cache directory viastd::env::temp_dir()or a similar mechanism, withDYN_LORA_PATHstill taking precedence.Functionally this is fine; these are API/portability nits.
61-115: Tests cover core behavior; could add env/negative validation casesThe tests exercise construction, path calculation,
is_cached(directory existence), andvalidate_cachedfor valid vs. missing-weights cases, which is good.If you want stronger guarantees around future changes, consider adding:
- A
from_envtest that setsDYN_LORA_PATHto a custom directory and verifies it’s honored.- Additional
validate_cachednegatives for:
- Missing
adapter_config.jsonbut present weight file.- No directory at all for the given
lora_id(should returnOk(false)).Not strictly necessary, but they’d lock in the intended semantics more clearly.
lib/llm/src/lora/downloader.rs (2)
18-35: Clarify and potentially relax failure behavior across multiple sources indownload_if_neededRight now:
existserrors are silently ignored (if let Ok(exists) = ... && exists), which is fine for skipping incompatible sources.- But any error from
source.download(..).await?immediately aborts the entiredownload_if_neededcall instead of trying the remaining sources.- Validation failures (
validate_cachedreturning false) do log a warning and continue, which is more in line with a “try sources in order” approach.If the intent is to use
sourcesas an ordered fallback chain (e.g., S3 primary + secondary mirror, or different backends), you may wantdownloadfailures to be downgraded to a warning and then continue to the next source rather than bubbling the error and skipping remaining options.Example refactor shape:
- let downloaded_path = source.download(lora_uri, &dest_path).await?; + let downloaded_path = match source.download(lora_uri, &dest_path).await { + Ok(path) => path, + Err(e) => { + tracing::warn!( + "LoRA download failed from source: {} (uri: {}), trying next source: {e:?}", + std::any::type_name::<_>(), + lora_uri, + ); + continue; + } + };This keeps the overall API the same but improves resiliency when multiple sources are configured. If you intentionally want a hard failure on the first reachable-but-failing source, calling that out in a doc comment would help set expectations.
Also applies to: 37-63
68-71: Make cache key generation more robust across platforms and URI shapes
uri_to_cache_keycurrently does simple string replacement:fn uri_to_cache_key(&self, uri: &str) -> String { uri.replace("://", "_").replace(['/', '\\'], "_") }This is readable but:
- Leaves other potentially problematic characters (
?,*,<,>,:,|, etc.) in the filename, which can be illegal on some filesystems (notably Windows).- Risks extremely long filenames for long URIs.
Consider either:
- A stricter sanitization (whitelisting
[A-Za-z0-9._-]and mapping everything else to_), or- Using a hash of the URI (e.g., SHA-256) as the cache key, optionally with a short human-readable prefix.
For example, a simple hash-based approach:
-fn uri_to_cache_key(&self, uri: &str) -> String { - uri.replace("://", "_").replace(['/', '\\'], "_") -} +fn uri_to_cache_key(&self, uri: &str) -> String { + use sha2::{Digest, Sha256}; + let mut hasher = Sha256::new(); + hasher.update(uri.as_bytes()); + format!("{:x}", hasher.finalize()) +}This avoids portability issues and collisions at the cost of some readability.
lib/llm/src/lora/source.rs (1)
83-115: Avoid blocking filesystem traversal inside asyncmetadata
LocalLoRASource::metadataperforms a recursive directory walk usingstd::fs::read_dirandentry.metadata()directly inside an async fn:fn visit_dir(path: &Path, count: &mut usize, size: &mut u64) -> Result<()> { for entry in std::fs::read_dir(path)? { let entry = entry?; let path = entry.path(); if path.is_file() { *count += 1; *size += entry.metadata()?.len(); } else if path.is_dir() { visit_dir(&path, count, size)?; } } Ok(()) } visit_dir(&source_path, &mut file_count, &mut total_size)?;On large LoRA directories this can block the async runtime thread for noticeable time.
Consider offloading the traversal to a blocking thread (e.g.,
tokio::task::spawn_blocking) and awaiting its result, while keeping the public API the same. That keeps the async surface but avoids tying up the executor during heavy I/O.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
components/src/dynamo/common/lora/lora/__init__.py(1 hunks)components/src/dynamo/common/lora/lora/manager.py(1 hunks)components/src/dynamo/common/lora/lora/test_manager.py(1 hunks)lib/llm/Cargo.toml(1 hunks)lib/llm/src/lib.rs(1 hunks)lib/llm/src/lora/cache.rs(1 hunks)lib/llm/src/lora/downloader.rs(1 hunks)lib/llm/src/lora/mod.rs(1 hunks)lib/llm/src/lora/source.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/llm/src/lib.rslib/llm/src/lora/cache.rs
🧬 Code graph analysis (6)
lib/llm/src/lora/downloader.rs (1)
lib/llm/src/lora/source.rs (3)
exists(23-23)exists(78-81)exists(251-261)
components/src/dynamo/common/lora/lora/__init__.py (1)
components/src/dynamo/common/lora/lora/manager.py (2)
LoRAManager(29-130)LoRASourceProtocol(14-26)
lib/llm/src/lora/cache.rs (1)
lib/llm/src/lora/downloader.rs (1)
new(14-16)
components/src/dynamo/common/lora/lora/manager.py (3)
components/src/dynamo/common/lora/lora/test_manager.py (2)
download(23-29)exists(31-32)lib/llm/src/lora/cache.rs (1)
get_cache_path(28-30)lib/llm/src/lora/downloader.rs (1)
download_if_needed(22-66)
components/src/dynamo/common/lora/lora/test_manager.py (1)
components/src/dynamo/common/lora/lora/manager.py (7)
LoRAManager(29-130)download(20-22)exists(24-26)register_custom_source(56-64)download_lora(66-95)is_cached(97-108)_uri_to_cache_key(110-130)
lib/llm/src/lora/source.rs (2)
lib/llm/src/lora/cache.rs (2)
new(16-18)from_env(21-25)lib/llm/src/lora/downloader.rs (1)
new(14-16)
🪛 Ruff (0.14.5)
components/src/dynamo/common/lora/lora/manager.py
94-94: Do not catch blind exception: Exception
(BLE001)
components/src/dynamo/common/lora/lora/test_manager.py
23-23: Unused method argument: lora_uri
(ARG002)
31-31: Unused method argument: lora_uri
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: clippy (lib/runtime/examples)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: tests (.)
- GitHub Check: clippy (.)
🔇 Additional comments (4)
components/src/dynamo/common/lora/lora/manager.py (1)
29-65: LoRAManager construction and extension points look solidThe core design (wrapping
PyLoRACache/PyLoRADownloaderand exposing a simpleregister_custom_sourceregistry) is clean and minimal, with good type hints and docstrings. No functional issues here from my side.lib/llm/src/lib.rs (1)
26-26: Publicly exportingloramodule is appropriateAdding
pub mod lora;cleanly wires the new LoRA cache/downloader/source stack into the crate’s public API without affecting existing modules. No issues here.components/src/dynamo/common/lora/lora/__init__.py (1)
4-10: Clean public surface for the LoRA Python packageRe‑exporting
LoRAManagerandLoRASourceProtocoland locking them into__all__gives a clear, minimal public API fordynamo.common.lora. Looks good.lib/llm/src/lora/mod.rs (1)
1-15: LoRA module facade and re-exports look goodThe module-level docs plus
pub useofLoRACache,LoRADownloader, and the source traits/types create a clean, central entry point for the LoRA infra. No issues spotted.
417208e to
5c000ce
Compare
5c000ce to
4c5158f
Compare
tedzhouhk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe we should let ModelExpress handle lora
- do we need eviction policy for lora cache on disk?
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean implementation of the LoRA downloading and caching layer! The trait-based design with LoRASourceTrait makes it nicely extensible for different storage backends.
I have 2 comments, which you can either do it later (add TODO), or do it at your leisure.
Thanks for the review @keivenchang! |
|
Overview:
Adds common LoRA management/download system.
Details:
For example ~
This keeps the LoRA source open to extension to any new source location.
for example NIM usage ~
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)