Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Nov 19, 2025

Overview:

Adds common LoRA management/download system.

  • Pluggable source architecture (local filesystem, extensible for S3/GCS) and URI-based addressing (file://, s3://, etc.)
  • Smart caching - avoids re-downloading existing adapters on disk
  • Async Python API with proper error handling
  • Cache validation and key management

Details:

  1. S3 downloader is built-in and handled in rust layer
  2. Allow user to bring their own lora downloader / converter (like NIM)

For example ~

class MyCustomDownloader:
    async def download(self, lora_id: str, dest_path: Path) -> Path:
        async with aiohttp.ClientSession() as session:
            async with session.get(f"[https://internal.com/loras/{lora_id}](https://internal.com/loras/{lora_id})") as resp:
                tarball = await resp.read()
        with tarfile.open(fileobj=io.BytesIO(tarball)) as tar:
            tar.extractall(dest_path)
        return dest_path

# Register custom downloader
manager = LoRAManager()
manager.register_custom_source('internal', MyCustomDownloader())

This keeps the LoRA source open to extension to any new source location.
for example NIM usage ~

lora_id="hf://my-lora-v1@lora-sha"` 
result = await manager.load_lora(
    lora_name="my-lora",
    lora_id="internal://my-lora-v1@lora-sha",  # Custom URL scheme
)

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

@biswapanda biswapanda requested review from a team as code owners November 19, 2025 12:32
@biswapanda biswapanda changed the title Bis/lora common feat: add LoRA common Nov 19, 2025
@github-actions github-actions bot added the feat label Nov 19, 2025
@biswapanda biswapanda changed the title feat: add LoRA common feat: add LoRA common APIs and implementation for lora management Nov 19, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

Introduces LoRA (Low-Rank Adaptation) management infrastructure combining Rust core with Python wrapper. Rust implementation provides caching, multi-source downloading, and a pluggable source trait supporting local filesystem and S3 backends. Python layer wraps Rust components with custom source registration points.

Changes

Cohort / File(s) Summary
Python LoRA Management
components/src/dynamo/common/lora/lora/__init__.py, components/src/dynamo/common/lora/lora/manager.py
Exposes LoRAManager and LoRASourceProtocol as package exports. Manager class integrates Rust-backed cache and downloader with Python extension points; enables custom source registration, LoRA downloads with source hints, cache checks, and URI-to-cache-key conversion.
Python LoRA Tests
components/src/dynamo/common/lora/lora/test_manager.py
Comprehensive unit tests covering custom source registration and download, local file sources, error handling, cache validation, and URI normalization with mock sources and temporary cache fixtures.
Rust Build Configuration
lib/llm/Cargo.toml
Adds object_store 0.12.4 dependency with cloud storage features (aws, gcp, azure, http) to support remote LoRA sources.
Rust Module Registration
lib/llm/src/lib.rs
Exposes new pub mod lora to the crate root, establishing public API entry point for LoRA functionality.
Rust LoRA Core Infrastructure
lib/llm/src/lora/cache.rs, lib/llm/src/lora/downloader.rs, lib/llm/src/lora/source.rs, lib/llm/src/lora/mod.rs
Implements multi-layered LoRA management: LoRACache handles disk storage with validation; LoRADownloader coordinates multi-source downloads with caching logic; LoRASourceTrait defines pluggable abstraction with LocalLoRASource and S3LoRASource implementations; mod.rs re-exports public API.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Key areas requiring attention:

  • S3 credential handling in source.rs: environment variable parsing, endpoint configuration, and virtual-host vs. path-style URL handling
  • URI-to-cache-key conversion consistency: verify matching logic between Python (_uri_to_cache_key) and Rust (uri_to_cache_key) implementations for s3://, gs://, http(s)://, and file:// schemes
  • Cache validation logic: confirm adapter_config.json presence and weight file format detection covers all supported formats
  • Async trait implementations: review LoRASourceTrait async methods and download_if_needed control flow for error propagation and fallback behavior
  • Local file handling: verify file:// URI parsing and directory-only semantics across both Python and Rust layers
  • Test coverage: assess adequacy of edge cases (missing credentials, malformed URIs, partial S3 downloads)

Poem

🐰 A hop, skip, and cache jump!
Rust and Python unite with a thump,
S3 sources and files now flow,
LoRAs descend where cache winds blow,
Adapters assembled—let models bump! 🎀

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description contains placeholder content and lacks clarity on the implementation approach and specific changes. Expand the 'Where should the reviewer start?' section with specific file recommendations, and replace the placeholder 'closes GitHub issue: #xxx' with the actual issue number (DEP-624 is mentioned but not properly formatted).
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 91.49% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'feat: add LoRA common APIs and implementation for lora management' accurately summarizes the main change - introducing a LoRA management system with APIs and implementations.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
components/src/dynamo/common/lora/lora/manager.py (1)

97-129: Align is_cached semantics with Rust cache validation (or document current behavior)

is_cached currently does:

cache_key = self._uri_to_cache_key(lora_uri)
return self.cache.is_cached(cache_key)

Given the Rust downloader checks both existence and validity:

if self.cache.is_cached(&cache_key) && self.cache.validate_cached(&cache_key)? {
    ...
}

is_cached() in Python will return True for directories that exist but are missing required files, while a later download might re‑fetch the LoRA because validate_cached fails. That divergence can be surprising to callers.

Consider either:

  • Mirroring the Rust semantics by incorporating validation (if exposed in PyLoRACache), or
  • Keeping the cheaper “directory exists” check but documenting that is_cached does not guarantee a fully valid cache entry.

_uri_to_cache_key itself looks consistent with the tests (s3/gs/http/file) and the Rust logic, so no issues there.

lib/llm/Cargo.toml (1)

90-92: Confirm object_store feature set matches intended LoRA backends

The object_store dependency with ["aws", "gcp", "azure", "http"] enables all major cloud backends plus generic HTTP. That’s convenient but increases compile time and dependency surface.

If you only plan to support a subset (e.g., S3 + HTTP initially), you may want to trim unused features to keep the binary smaller and dependency tree tighter. Otherwise this is fine as a starting point.

components/src/dynamo/common/lora/lora/test_manager.py (1)

16-149: Solid test coverage; minor robustness and lint nits

Overall this is a nice, focused test suite. A few small points:

  • MockLoRASource.download/exists don’t use lora_uri, which Ruff flags (ARG002). If you care about a clean lint run, you can simply acknowledge the parameter:
-    async def download(self, lora_uri: str, dest_path: Path) -> Path:
-        self.download_called = True
+    async def download(self, lora_uri: str, dest_path: Path) -> Path:
+        _ = lora_uri
+        self.download_called = True
 ...
-    async def exists(self, lora_uri: str) -> bool:
-        return self.should_exist
+    async def exists(self, lora_uri: str) -> bool:
+        _ = lora_uri
+        return self.should_exist
  • test_manager_custom_source_not_found: currently only checks that "status" is present, so it doesn’t really constrain behavior. Once the manager’s behavior around exists() is finalized, you may want to assert on a specific status value (e.g., "error") or on the presence of an error message to catch regressions.
  • test_manager_is_cached: by just asserting isinstance(is_cached, bool) it only verifies that the call doesn’t crash. You could consider asserting the expected boolean once you lock in the cache-key semantics (s3 URI → "path/to/cached-lora"), to ensure the Python wrapper stays aligned with the Rust cache behavior.

All of these are optional polish; functionally the tests exercise the key flows.

lib/llm/src/lora/cache.rs (2)

7-59: LoRACache implementation is correct; consider API shape and portability

The cache logic looks sound: get_cache_path is a simple join, is_cached checks existence, and validate_cached enforces adapter_config.json plus at least one known weight file.

Two minor considerations:

  • from_env currently returns Result<Self> but cannot fail (it always falls back to DEFAULT_LORA_CACHE_DIR). You could simplify the API to return Self directly, or, if you expect to add fallible behavior later, leave it as-is but add a comment that it’s intentionally infallible today.
  • DEFAULT_LORA_CACHE_DIR is hard-coded to "/tmp/dynamo_loras", which is fine for Linux-like environments but not ideal on Windows. If this crate is ever used cross‑platform, consider deriving a per‑platform temp/cache directory via std::env::temp_dir() or a similar mechanism, with DYN_LORA_PATH still taking precedence.

Functionally this is fine; these are API/portability nits.


61-115: Tests cover core behavior; could add env/negative validation cases

The tests exercise construction, path calculation, is_cached (directory existence), and validate_cached for valid vs. missing-weights cases, which is good.

If you want stronger guarantees around future changes, consider adding:

  • A from_env test that sets DYN_LORA_PATH to a custom directory and verifies it’s honored.
  • Additional validate_cached negatives for:
    • Missing adapter_config.json but present weight file.
    • No directory at all for the given lora_id (should return Ok(false)).

Not strictly necessary, but they’d lock in the intended semantics more clearly.

lib/llm/src/lora/downloader.rs (2)

18-35: Clarify and potentially relax failure behavior across multiple sources in download_if_needed

Right now:

  • exists errors are silently ignored (if let Ok(exists) = ... && exists), which is fine for skipping incompatible sources.
  • But any error from source.download(..).await? immediately aborts the entire download_if_needed call instead of trying the remaining sources.
  • Validation failures (validate_cached returning false) do log a warning and continue, which is more in line with a “try sources in order” approach.

If the intent is to use sources as an ordered fallback chain (e.g., S3 primary + secondary mirror, or different backends), you may want download failures to be downgraded to a warning and then continue to the next source rather than bubbling the error and skipping remaining options.

Example refactor shape:

-                let downloaded_path = source.download(lora_uri, &dest_path).await?;
+                let downloaded_path = match source.download(lora_uri, &dest_path).await {
+                    Ok(path) => path,
+                    Err(e) => {
+                        tracing::warn!(
+                            "LoRA download failed from source: {} (uri: {}), trying next source: {e:?}",
+                            std::any::type_name::<_>(),
+                            lora_uri,
+                        );
+                        continue;
+                    }
+                };

This keeps the overall API the same but improves resiliency when multiple sources are configured. If you intentionally want a hard failure on the first reachable-but-failing source, calling that out in a doc comment would help set expectations.

Also applies to: 37-63


68-71: Make cache key generation more robust across platforms and URI shapes

uri_to_cache_key currently does simple string replacement:

fn uri_to_cache_key(&self, uri: &str) -> String {
    uri.replace("://", "_").replace(['/', '\\'], "_")
}

This is readable but:

  • Leaves other potentially problematic characters (?, *, <, >, :, |, etc.) in the filename, which can be illegal on some filesystems (notably Windows).
  • Risks extremely long filenames for long URIs.

Consider either:

  • A stricter sanitization (whitelisting [A-Za-z0-9._-] and mapping everything else to _), or
  • Using a hash of the URI (e.g., SHA-256) as the cache key, optionally with a short human-readable prefix.

For example, a simple hash-based approach:

-fn uri_to_cache_key(&self, uri: &str) -> String {
-    uri.replace("://", "_").replace(['/', '\\'], "_")
-}
+fn uri_to_cache_key(&self, uri: &str) -> String {
+    use sha2::{Digest, Sha256};
+    let mut hasher = Sha256::new();
+    hasher.update(uri.as_bytes());
+    format!("{:x}", hasher.finalize())
+}

This avoids portability issues and collisions at the cost of some readability.

lib/llm/src/lora/source.rs (1)

83-115: Avoid blocking filesystem traversal inside async metadata

LocalLoRASource::metadata performs a recursive directory walk using std::fs::read_dir and entry.metadata() directly inside an async fn:

fn visit_dir(path: &Path, count: &mut usize, size: &mut u64) -> Result<()> {
    for entry in std::fs::read_dir(path)? {
        let entry = entry?;
        let path = entry.path();
        if path.is_file() {
            *count += 1;
            *size += entry.metadata()?.len();
        } else if path.is_dir() {
            visit_dir(&path, count, size)?;
        }
    }
    Ok(())
}

visit_dir(&source_path, &mut file_count, &mut total_size)?;

On large LoRA directories this can block the async runtime thread for noticeable time.

Consider offloading the traversal to a blocking thread (e.g., tokio::task::spawn_blocking) and awaiting its result, while keeping the public API the same. That keeps the async surface but avoids tying up the executor during heavy I/O.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d01b6d2 and 417208e.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • components/src/dynamo/common/lora/lora/__init__.py (1 hunks)
  • components/src/dynamo/common/lora/lora/manager.py (1 hunks)
  • components/src/dynamo/common/lora/lora/test_manager.py (1 hunks)
  • lib/llm/Cargo.toml (1 hunks)
  • lib/llm/src/lib.rs (1 hunks)
  • lib/llm/src/lora/cache.rs (1 hunks)
  • lib/llm/src/lora/downloader.rs (1 hunks)
  • lib/llm/src/lora/mod.rs (1 hunks)
  • lib/llm/src/lora/source.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/llm/src/lib.rs
  • lib/llm/src/lora/cache.rs
🧬 Code graph analysis (6)
lib/llm/src/lora/downloader.rs (1)
lib/llm/src/lora/source.rs (3)
  • exists (23-23)
  • exists (78-81)
  • exists (251-261)
components/src/dynamo/common/lora/lora/__init__.py (1)
components/src/dynamo/common/lora/lora/manager.py (2)
  • LoRAManager (29-130)
  • LoRASourceProtocol (14-26)
lib/llm/src/lora/cache.rs (1)
lib/llm/src/lora/downloader.rs (1)
  • new (14-16)
components/src/dynamo/common/lora/lora/manager.py (3)
components/src/dynamo/common/lora/lora/test_manager.py (2)
  • download (23-29)
  • exists (31-32)
lib/llm/src/lora/cache.rs (1)
  • get_cache_path (28-30)
lib/llm/src/lora/downloader.rs (1)
  • download_if_needed (22-66)
components/src/dynamo/common/lora/lora/test_manager.py (1)
components/src/dynamo/common/lora/lora/manager.py (7)
  • LoRAManager (29-130)
  • download (20-22)
  • exists (24-26)
  • register_custom_source (56-64)
  • download_lora (66-95)
  • is_cached (97-108)
  • _uri_to_cache_key (110-130)
lib/llm/src/lora/source.rs (2)
lib/llm/src/lora/cache.rs (2)
  • new (16-18)
  • from_env (21-25)
lib/llm/src/lora/downloader.rs (1)
  • new (14-16)
🪛 Ruff (0.14.5)
components/src/dynamo/common/lora/lora/manager.py

94-94: Do not catch blind exception: Exception

(BLE001)

components/src/dynamo/common/lora/lora/test_manager.py

23-23: Unused method argument: lora_uri

(ARG002)


31-31: Unused method argument: lora_uri

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: clippy (lib/runtime/examples)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (.)
  • GitHub Check: clippy (.)
🔇 Additional comments (4)
components/src/dynamo/common/lora/lora/manager.py (1)

29-65: LoRAManager construction and extension points look solid

The core design (wrapping PyLoRACache/PyLoRADownloader and exposing a simple register_custom_source registry) is clean and minimal, with good type hints and docstrings. No functional issues here from my side.

lib/llm/src/lib.rs (1)

26-26: Publicly exporting lora module is appropriate

Adding pub mod lora; cleanly wires the new LoRA cache/downloader/source stack into the crate’s public API without affecting existing modules. No issues here.

components/src/dynamo/common/lora/lora/__init__.py (1)

4-10: Clean public surface for the LoRA Python package

Re‑exporting LoRAManager and LoRASourceProtocol and locking them into __all__ gives a clear, minimal public API for dynamo.common.lora. Looks good.

lib/llm/src/lora/mod.rs (1)

1-15: LoRA module facade and re-exports look good

The module-level docs plus pub use of LoRACache, LoRADownloader, and the source traits/types create a clean, central entry point for the LoRA infra. No issues spotted.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@tedzhouhk tedzhouhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe we should let ModelExpress handle lora
  2. do we need eviction policy for lora cache on disk?

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean implementation of the LoRA downloading and caching layer! The trait-based design with LoRASourceTrait makes it nicely extensible for different storage backends.

I have 2 comments, which you can either do it later (add TODO), or do it at your leisure.

@biswapanda
Copy link
Contributor Author

Nice clean implementation of the LoRA downloading and caching layer! The trait-based design with LoRASourceTrait makes it nicely extensible for different storage backends.

I have 2 comments, which you can either do it later (add TODO), or do it at your leisure.

Thanks for the review @keivenchang!
I have addressed your comments.

@biswapanda
Copy link
Contributor Author

  1. maybe we should let ModelExpress handle lora
  2. do we need eviction policy for lora cache on disk?
  1. good point - this is the plan to consolidate the MEX and cloud storage based LoRA downloading. created a linear ticket for the same.
  2. eviction policy for lora cache on disk: Currently, there is no need for eviction because re-download (couple of seconds from S3) costs more than 100MB disk size.
    unload_lora endpoint should delete the lora from local disk. Disk have very large capacity and avoids redownloading ( typical size of a lora model ~10-100MB).
    Eevn upto 1000s of LoRA scale ~ disk size is 100GB. Probably we dont need the eviction policy for disk yet.

@biswapanda biswapanda enabled auto-merge (squash) November 25, 2025 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants