Skip to content

Conversation

@danieldk
Copy link
Member

This change allows Python projects that use kernels to lock the kernel revisions on a project-basis. For this to work, the user only has to include hf-kernels as a build dependency. During the build, a lock file is written to the package's pkg-info. During runtime we can read it out and use the corresponding revision. When the kernel is not locked, the revision that is provided as an argument is used.

This change allows Python projects that use kernels to lock the
kernel revisions on a project-basis. For this to work, the user
only has to include `hf-kernels` as a build dependency. During
the build, a lock file is written to the package's pkg-info.
During runtime we can read it out and use the corresponding
revision. When the kernel is not locked, the revision that is provided
as an argument is used.
with open(pyproject, "rb") as f:
data = tomllib.load(f)

kernel_versions = _get_nested_attr(data, ["tool", "kernels", "dependencies"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I don't like those helper functions (they are super easy to screw up).

data.get("tool", {}).get("kernels", {}).get("dependencies", None)

Is more readable to me than delegating to another function.

Comment on lines +59 to +61
locked_revision = _get_caller_locked_kernel(repo_id)
if locked_revision is not None:
revision = locked_revision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clashes with the argument.

IMHO:
We should have EITHER revision be used, but never both, meaning:
if we can have a proper lockfile, then we should ALWAYS use the lockfile.
If there are side effects that may get no lockfile, we should ALWAYS ask for the version.

Making where the information comes from is much easier to debug afterwards imho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. How should be go about this? Something like revision=None means lock file, otherwise get the specified revision? (I only Python had sum types 😁 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we just support one.

I'm fine looking for the lockfile always if it works (and if people don´t use setuptools we crash first, figure it out later)

revision = locked_revision

filename = hf_hub_download(
repo_id, "build.toml", local_files_only=True, revision=revision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just use "main" all the time since we're not going to use revision as a versionning system.

return [importlib.metadata.distribution(dist_name) for dist_name in dist_names]


def _get_caller_module() -> Optional[ModuleType]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty tricky.

I can see some potential screw ups in the resoluion there (like dependency injection, callback feeding etc..).
But in general I think it's good enough to get started.

Worst case we simply inspect the entire callstack, we're bound to find our caller somewhere.

@danieldk
Copy link
Member Author

Abandoning this PR for one with real lock files.

@danieldk danieldk closed this Jan 20, 2025
@danieldk danieldk deleted the kernel-revision-locking branch January 20, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants