Skip to content

sync with open source how#118

Draft
lesterhaynes wants to merge 8430 commits intolinkedin:li_trunkfrom
apache:master
Draft

sync with open source how#118
lesterhaynes wants to merge 8430 commits intolinkedin:li_trunkfrom
apache:master

Conversation

@lesterhaynes
Copy link

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

benfei and others added 21 commits January 26, 2026 14:31
* Simplify encoding kw_only args and test init=False args.

1. Replace the helper function for determining whether to use kw_only with a simpler to follow condition.
2. Add test coverage for the init=False case.

* Fix pre-commit.
#36667)

the change stream partition mode

For MUTABLE_KEY_RANGE change stream, use read_proto_bytes_, else use read_json_
Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.21 to 0.0.22.
- [Release notes](https://github.com/Kludex/python-multipart/releases)
- [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md)
- [Commits](Kludex/python-multipart@0.0.21...0.0.22)

---
updated-dependencies:
- dependency-name: python-multipart
  dependency-version: 0.0.22
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Enable some tests in runner v2 batch mode.

* Skip the failed tests after re-enabling some categories.
… only claim timestamps that have been fully processed from the restriction tracker. (#37326)
* Update yaml colab notebook

* add runners

* remove old copy

* update yaml version

* change to master

* fix error in patch and update yaml example

* correct path
…#37439)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.12.3 to 2.12.4.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Commits](nats-io/nats-server@v2.12.3...v2.12.4)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-version: 2.12.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add exception chaining to preserve error context

- Add 'from e' to exception re-raises in CloudSQLEnrichmentHandler
- Add exception chaining in processes.py for OSError and CalledProcessError
- Improve logging in core.py to preserve traceback context

This improves debuggability by preserving the full exception chain,
following Python PEP 3134 best practices.

Fixes #37422

* Fix yapf formatting for logging.warning statement

* Fix yapf formatting: put logging arguments on single line
…e cache directory (#37360)

* Fix cached wheels used in future runs

* address review comments

* run post tests

* add .github/trigger_files/beam_PostCommit_Python_Examples_Dataflow.json
* Fix python postcommit

* Trigger postcommit
* remove pubsublite from java sdk

* revert builde file change impacting pubsub

* remove unused dependencies
Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.21.0 to 1.21.1.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@v1.21.0...v1.21.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager
  dependency-version: 1.21.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* remove groovy pubsublite dependencies

* remove checkstyle suppressions

* remove role config for pubsublite service and update role files
claudevdm and others added 30 commits March 6, 2026 11:58
* add more parquet options

* comments

* more tests and use default
…nd up in the same shard as other records as it is random. Simplify the test to use Iterables instead of arrays.
#37510)

* If the partition count or kafka IO size is large, then skip committing offsets that are not changed. Reduce kafka commit load

* Address PR review feedback for idle partition optimization

- Refactor commitCheckpointMark to use Java streams (per @johnjcasey)
  Changed from explicit for-loop to streams-based filtering for better
  code consistency with existing patterns

- Add debug logging for idle partitions (per @tomstepp)
  Log the count of idle partitions skipped during each commit to aid
  in monitoring and debugging the optimization

- Implement time-based periodic commits (per @tomstepp)
  Track last commit time per partition and ensure commits happen at
  least every 10 minutes even for idle partitions. This supports time
  lag monitoring use cases where customers track time since last commit.

- Add unit test for idle partition behavior (per @tomstepp)
  New test KafkaUnboundedReaderIdlePartitionTest verifies that:
  * Idle partitions are not committed repeatedly
  * Active partitions trigger commits correctly
  * Uses mock consumer to track commit calls

All changes maintain backward compatibility and follow Apache Beam
coding standards (spotless formatting applied).

* Fix test to follow Beam patterns for MockConsumer initialization

Rewrote KafkaUnboundedReaderIdlePartitionTest to follow the exact
pattern used in KafkaIOTest.java:
- Proper MockConsumer initialization with partition metadata
- Correct setup of beginning/end offsets
- Consumer records with proper offsets and timestamps
- schedulePollTask for record enqueueing based on position
- Override commitSync to track commit calls
- Use reader.start() before reader.advance()

This ensures the test properly initializes the Kafka consumer and
doesn't fail with IllegalStateException during source.split().

---------

Co-authored-by: Kishore Pola <kpola@paloaltonetworks.com>
This message is printed on every import of apache-beam, which is unnecessarily verbose.
* Move FileIO close from RecordWriter to RecordWriterManager

* fix

* clarify FileIO ownership comments and verify close
…7760)

* Fix BadImport ErrorProne violations across multiple modules

* spotless
* ITs for RESTCatalog using BLMS

* update rest catalog config

* use top-level gcs bucket for warehouse
…t cannot be decoded successfully (#37762)

Such messages will log an error but are otherwise discarded.
Update PaneInfoCoder to throw a CoderException instead of ArrayOutOfBoundsException
Fix PreCommit Python ML tests with ML deps installed
Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.1.2 to 3.1.5.
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](isaacs/minimatch@v3.1.2...v3.1.5)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-version: 3.1.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…37795)

* Fix #37738: handle Debezium DELETE records without valueSchema

* refactor: replace fully qualified class names with imports in KafkaConnectSchemaTest.
* Loosen GRPC requirements.

* Change link to prevent throttling.

* Change link to prevent throttling.
* Support inferring schemas from Python dataclasses

* Address comments; Revert native_type_compatibility _TypeMapEntry change

* Add unit test for named tuple and dataclasses encoded by RowCoder and passing through GBK

* Fix lint
* Pin cloudml benchmark deps to avoid pip resolution-too-deep on Dataflow

* Reduce Dataflow inactivity timeout risk for TFT CloudML benchmark

* Tighten CloudML TFT benchmark requirements

* focus fix on dependency bounds only
… InvalidLink checks (#37773)

* Fix InvalidInlineTag, InvalidParam, InvalidBlockTag and InvalidLink javadocs

* Fix JdbcUtil after merge

* spotless

* changes

* leave ignore block

* Fix InvalidLink and restore InvalidBlockTag to disabledChecks

* Remove duplicate entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.