Skip to content

Conversation

@Ronxvier
Copy link
Contributor

Adds batching support to fetch_df_by_partition with use_batching and batch_size parameters. When enabled, files are processed in smaller batches instead of loading all files into memory simultaneously, which should reduce peak memory usage if users have extremely large file counts. Uses the existing fetch_dfs_by_paths_batching function. Maintains backward compatibility - existing code works unchanged.

@houqp houqp requested review from asura-io and Copilot August 17, 2025 23:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds optional batching functionality to the fetch_df_by_partition function to improve memory efficiency when processing large numbers of files. The change introduces two new optional parameters to control batching behavior while maintaining full backward compatibility.

  • Added use_batching and batch_size parameters to fetch_df_by_partition
  • Implemented fetch_dfs_by_paths_batching function to handle batched file processing
  • Enhanced function documentation with proper parameter descriptions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@houqp houqp requested a review from PeterKeDer August 18, 2025 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant