-
Notifications
You must be signed in to change notification settings - Fork 412
docs: update python documentation #9592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
📚 Documentation preview at https://pr-9592.docs-lakefs-preview.io/ (Updated: 11/11/2025, 11:26:07 AM - Commit: 7272fee) |
talSofer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving this part of our docs! it is 100x better than what we had before!
Added some comments, none is blocking but I will appreciate it if you resolve them
docs/src/integrations/python.md
Outdated
| for diff in main.diff(other_ref=branch1): | ||
| print(diff) | ||
| ``` | ||
| ## Python Integration Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the table content. The table is great, I would extend it to include the important info from here and remove this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrote the page to integrate both parts
docs/src/integrations/python.md
Outdated
| | **Boto3** | Medium | S3-compatible operations, existing S3 workflows, direct gateway access | `pip install boto3` | Low | | ||
|
|
||
| #### Merging changes from a branch into main | ||
| ## Quick Start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a general Python quickstart? why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed - it was incremental change where thought to introduce the SDK by quick start and later added the list of entry points based on each sdk/library you like to work with.
| #### Get object metadata | ||
|
|
||
| Get object metadata using branch and path: | ||
| ## References & Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be part of the table above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will pull up the relevant references and rewrite the list of python integrations.
|
|
||
| References, commits, and commit metadata are fundamental to understanding and auditing changes in lakeFS. This guide covers navigating commit history, working with references, and using metadata for tracking and lineage. | ||
|
|
||
| ## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a duplicate
|
|
||
| This guide covers object operations in lakeFS, including uploading, downloading, batch operations, and metadata management. | ||
|
|
||
| ## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove
| ## Understanding Objects & Files | ||
|
|
||
| ### What are Objects & Files? | ||
|
|
||
| Objects are the files stored in lakeFS. Upload, download, and manage them through branches: | ||
|
|
||
| ```python | ||
| import lakefs | ||
|
|
||
| branch = lakefs.repository("my-repo").branch("main") | ||
|
|
||
| # Upload a file | ||
| branch.object("data/dataset.csv").upload( | ||
| data=b"id,name\n1,Alice\n2,Bob" | ||
| ) | ||
|
|
||
| # Read a file | ||
| with branch.object("data/dataset.csv").reader() as f: | ||
| print(f.read()) | ||
| ``` | ||
|
|
||
| Objects in lakeFS allow you to: | ||
|
|
||
| - **Store files** at any path with support for large files | ||
| - **Version files** across branches and commits | ||
| - **Manage metadata** about file checksums, sizes, and types | ||
| - **Track changes** across data versions using diffs | ||
| - **Organize data** with hierarchical paths and prefixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can skip explaining objects. I would delete this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
| ## Data Pipeline Workflow | ||
|
|
||
| ### Creating a Complete Data Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this can be part of real-world workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed some of the examples and merge as you suggested the pipeline and streaming to real-world workflows.
docs/src/integrations/python-sdk.md
Outdated
| - Prefer **direct API interaction** patterns | ||
| - Need to **access all API endpoints** programmatically | ||
|
|
||
| For most common lakeFS operations (branches, tags, commits, objects), the **[High-Level SDK](./python.md)** is recommended as it provides a more Pythonic interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can put this in a tip box
Co-authored-by: talSofer <tal.sofer@treeverse.io>
Co-authored-by: talSofer <tal.sofer@treeverse.io>
No description provided.