Skip to content

Fix pipeline callbacks that fail under multiprocessing pickling#26

Merged
Ulthran merged 1 commit intomainfrom
codex/fix-test-errors-related-to-pickling
Feb 5, 2026
Merged

Fix pipeline callbacks that fail under multiprocessing pickling#26
Ulthran merged 1 commit intomainfrom
codex/fix-test-errors-related-to-pickling

Conversation

@Ulthran
Copy link
Contributor

@Ulthran Ulthran commented Feb 5, 2026

Motivation

  • Tests were failing because callbacks (local functions / lambdas) passed to the pipelines were not pickleable when using process-based pools, causing AttributeError: Can't get local object during multiprocessing.
  • The pipeline needs to support local callback functions while preserving chunked processing and counter aggregation.

Description

  • Replaced process-based multiprocessing usage with ThreadPool by importing ThreadPool from multiprocessing.pool and using it for parallel execution in src/heyfastqlib/pipelines.py.
  • Added a fast-path for single-threaded execution (threads == 1) that calls _filter_worker / _map_worker inline to avoid any pool overhead or pickling.
  • Kept existing chunking via _chunk_reads and counter aggregation via _merge_counters so read/base accounting remains unchanged.

Testing

  • Ran black . to format the code, which completed successfully.
  • Ran pytest tests, which passed with all tests succeeding (36 passed).

Codex Task

Copilot AI review requested due to automatic review settings February 5, 2026 20:56
@Ulthran Ulthran merged commit 801b769 into main Feb 5, 2026
8 checks passed
@Ulthran Ulthran deleted the codex/fix-test-errors-related-to-pickling branch February 5, 2026 20:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes multiprocessing pickling failures when local functions or lambdas are passed as callbacks to pipeline functions. The issue occurred because the previous implementation used process-based Pool from multiprocessing, which requires serializing (pickling) all arguments including callback functions.

Changes:

  • Switched from process-based Pool to ThreadPool to avoid pickling requirements for callback functions
  • Added single-threaded fast-path optimization when threads == 1 to bypass pool overhead
  • Preserved existing chunking and counter aggregation behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant