sync with open source how by lesterhaynes · Pull Request #118 · linkedin/beam

lesterhaynes · 2024-03-14T08:15:08Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

* Simplify encoding kw_only args and test init=False args. 1. Replace the helper function for determining whether to use kw_only with a simpler to follow condition. 2. Add test coverage for the init=False case. * Fix pre-commit.

#36667) the change stream partition mode For MUTABLE_KEY_RANGE change stream, use read_proto_bytes_, else use read_json_

#37406)

… for portable SchemaCoders.

Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.21 to 0.0.22. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md) - [Commits](Kludex/python-multipart@0.0.21...0.0.22) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.22 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Enable some tests in runner v2 batch mode. * Skip the failed tests after re-enabling some categories.

… only claim timestamps that have been fully processed from the restriction tracker. (#37326)

* Update yaml colab notebook * add runners * remove old copy * update yaml version * change to master * fix error in patch and update yaml example * correct path

…#37439) Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.12.3 to 2.12.4. - [Release notes](https://github.com/nats-io/nats-server/releases) - [Commits](nats-io/nats-server@v2.12.3...v2.12.4) --- updated-dependencies: - dependency-name: github.com/nats-io/nats-server/v2 dependency-version: 2.12.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add exception chaining to preserve error context - Add 'from e' to exception re-raises in CloudSQLEnrichmentHandler - Add exception chaining in processes.py for OSError and CalledProcessError - Improve logging in core.py to preserve traceback context This improves debuggability by preserving the full exception chain, following Python PEP 3134 best practices. Fixes #37422 * Fix yapf formatting for logging.warning statement * Fix yapf formatting: put logging arguments on single line

…e cache directory (#37360) * Fix cached wheels used in future runs * address review comments * run post tests * add .github/trigger_files/beam_PostCommit_Python_Examples_Dataflow.json

* Fix python postcommit * Trigger postcommit

* remove pubsublite from java sdk * revert builde file change impacting pubsub * remove unused dependencies

Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.21.0 to 1.21.1. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json) - [Commits](aws/aws-sdk-go-v2@v1.21.0...v1.21.1) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager dependency-version: 1.21.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove groovy pubsublite dependencies * remove checkstyle suppressions * remove role config for pubsublite service and update role files

* add more parquet options * comments * more tests and use default

…iggers

…ion. (#37755)

…nd up in the same shard as other records as it is random. Simplify the test to use Iterables instead of arrays.

@johnjcasey

#37510) * If the partition count or kafka IO size is large, then skip committing offsets that are not changed. Reduce kafka commit load * Address PR review feedback for idle partition optimization - Refactor commitCheckpointMark to use Java streams (per @johnjcasey) Changed from explicit for-loop to streams-based filtering for better code consistency with existing patterns - Add debug logging for idle partitions (per @tomstepp) Log the count of idle partitions skipped during each commit to aid in monitoring and debugging the optimization - Implement time-based periodic commits (per @tomstepp) Track last commit time per partition and ensure commits happen at least every 10 minutes even for idle partitions. This supports time lag monitoring use cases where customers track time since last commit. - Add unit test for idle partition behavior (per @tomstepp) New test KafkaUnboundedReaderIdlePartitionTest verifies that: * Idle partitions are not committed repeatedly * Active partitions trigger commits correctly * Uses mock consumer to track commit calls All changes maintain backward compatibility and follow Apache Beam coding standards (spotless formatting applied). * Fix test to follow Beam patterns for MockConsumer initialization Rewrote KafkaUnboundedReaderIdlePartitionTest to follow the exact pattern used in KafkaIOTest.java: - Proper MockConsumer initialization with partition metadata - Correct setup of beginning/end offsets - Consumer records with proper offsets and timestamps - schedulePollTask for record enqueueing based on position - Override commitSync to track commit calls - Use reader.start() before reader.advance() This ensures the test properly initializes the Kafka consumer and doesn't fail with IllegalStateException during source.split(). --------- Co-authored-by: Kishore Pola <kpola@paloaltonetworks.com>

… shard count

…ency-1 Fix Python PostCommit Dependency

This message is printed on every import of apache-beam, which is unnecessarily verbose.

* Move FileIO close from RecordWriter to RecordWriterManager * fix * clarify FileIO ownership comments and verify close

…7760) * Fix BadImport ErrorProne violations across multiple modules * spotless

* ITs for RESTCatalog using BLMS * update rest catalog config * use top-level gcs bucket for warehouse

…Worker (#37797)

…t cannot be decoded successfully (#37762) Such messages will log an error but are otherwise discarded. Update PaneInfoCoder to throw a CoderException instead of ArrayOutOfBoundsException

Fix PreCommit Python ML tests with ML deps installed

…rather t…" (#37801) This reverts commit 6a1618e.

Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.1.2 to 3.1.5. - [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md) - [Commits](isaacs/minimatch@v3.1.2...v3.1.5) --- updated-dependencies: - dependency-name: minimatch dependency-version: 3.1.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

… Logging (#37662)

…37795) * Fix #37738: handle Debezium DELETE records without valueSchema * refactor: replace fully qualified class names with imports in KafkaConnectSchemaTest.

* Loosen GRPC requirements. * Change link to prevent throttling. * Change link to prevent throttling.

…ings across the codebase

* Support inferring schemas from Python dataclasses * Address comments; Revert native_type_compatibility _TypeMapEntry change * Add unit test for named tuple and dataclasses encoded by RowCoder and passing through GBK * Fix lint

* Pin cloudml benchmark deps to avoid pip resolution-too-deep on Dataflow * Reduce Dataflow inactivity timeout risk for TFT CloudML benchmark * Tighten CloudML TFT benchmark requirements * focus fix on dependency bounds only

… InvalidLink checks (#37773) * Fix InvalidInlineTag, InvalidParam, InvalidBlockTag and InvalidLink javadocs * Fix JdbcUtil after merge * spotless * changes * leave ignore block * Fix InvalidLink and restore InvalidBlockTag to disabledChecks * Remove duplicate entry

github-actions bot added build java python examples go infra kotlin learning model labels Mar 14, 2024

benfei and others added 21 commits January 26, 2026 14:31

Update ChangeStreamDao to query different TVF for postgresSQL based on (

6a399df

#36667) the change stream partition mode For MUTABLE_KEY_RANGE change stream, use read_proto_bytes_, else use read_json_

Fix str(WindowedValueCoder) crash when underlying coder isn't KV coder (

f8013e6

#37406)

Merge pull request #37376: Adds a new SchemaCoderPayload proto to use…

e12436a

… for portable SchemaCoders.

fix key error for huggingface notebook (#37435)

52b9910

Enable some more tests in runner v2 batch mode. (#37363)

672b888

* Enable some tests in runner v2 batch mode. * Skip the failed tests after re-enabling some categories.

[Spanner Change Streams] Fix potential data loss issue by ensuring to…

47df6ae

… only claim timestamps that have been fully processed from the restriction tracker. (#37326)

Update yaml notebook (#37369)

0b32d24

* Update yaml colab notebook * add runners * remove old copy * update yaml version * change to master * fix error in patch and update yaml example * correct path

Disable Beam Metrics Report workflow (#37440)

f7784d8

better formatting output (#37441)

e26b76b

Bump cloud.google.com/go/storage from 1.59.1 to 1.59.2 in /sdks (#37443)

61fe950

Bump go.mongodb.org/mongo-driver from 1.17.6 to 1.17.7 in /sdks (#37381)

98c980a

remove website, typescript, etc - pubsublite uses (#37412)

18493cf

Only stage required wheel packages from requirements cache, not entir…

0d72265

…e cache directory (#37360) * Fix cached wheels used in future runs * address review comments * run post tests * add .github/trigger_files/beam_PostCommit_Python_Examples_Dataflow.json

Fix python postcommit (#37447)

9fcb174

* Fix python postcommit * Trigger postcommit

remove pubsublite from java sdk (#37448)

96edda4

* remove pubsublite from java sdk * revert builde file change impacting pubsub * remove unused dependencies

Remove pubsublite - infra, groovy, checkstyles (#37450)

d4015eb

* remove groovy pubsublite dependencies * remove checkstyle suppressions * remove role config for pubsublite service and update role files

claudevdm and others added 30 commits March 6, 2026 11:58

Add more ParquetIo write options (#37740)

8ee8b10

* add more parquet options * comments * more tests and use default

Merge pull request #37715: Disable combiner lifting only for count tr…

b6bc904

…iggers

fix: Correct malformed Javadoc tags and update Error Prone configurat…

927ee2c

…ion. (#37755)

Fix flaky TextIOWriteTest by loosening the shard count. Records may e…

4d9e7fc

…nd up in the same shard as other records as it is random. Simplify the test to use Iterables instead of arrays.

Merge pull request #37798: Fix flaky TextIOWriteTest by loosening the…

95d8481

… shard count

Merge pull request #37725 from aIbrahiim/fix-python-postcommit-depend…

ea473d2

…ency-1 Fix Python PostCommit Dependency

Remove logging in tfrecordio.py (#37794)

9b915fd

This message is printed on every import of apache-beam, which is unnecessarily verbose.

Move FileIO close from RecordWriter to RecordWriterManager (#37782)

b203f53

* Move FileIO close from RecordWriter to RecordWriterManager * fix * clarify FileIO ownership comments and verify close

[ErrorProne] Enable BadImport ErrorProne check and fix violations (#3…

6c42cc1

…7760) * Fix BadImport ErrorProne violations across multiple modules * spotless

Use p310_ml_test

78b22bc

add py313 dep (#37799)

d10374b

[IcebergIO] Add ITs for RESTCatalog using BLMS (#35360)

d27dc82

* ITs for RESTCatalog using BLMS * update rest catalog config * use top-level gcs bucket for warehouse

[Dataflow Streaming] Remove nullness suppression of StreamingDataflow…

5034e40

…Worker (#37797)

[Dataflow Streaming] Add a pipeline option to skip input elements tha…

d6759cf

…t cannot be decoded successfully (#37762) Such messages will log an error but are otherwise discarded. Update PaneInfoCoder to throw a CoderException instead of ArrayOutOfBoundsException

Merge pull request #37800 from apache/fix-python-ml-3-10

eccfdbc

Fix PreCommit Python ML tests with ML deps installed

Revert "fix(python): Register all output pcollections of a transform …

084c4da

…rather t…" (#37801) This reverts commit 6a1618e.

[Dataflow Java Runner] Add support for sending logs directly to Cloud…

57ab2b9

… Logging (#37662)

update container version (#37811)

7d756c2

[Java][Debezium] Fix NPE in debeziumRecordInstant for DELETE events (#…

e6fcdd7

…37795) * Fix #37738: handle Debezium DELETE records without valueSchema * refactor: replace fully qualified class names with imports in KafkaConnectSchemaTest.

Loosen GRPC requirements. (#37817)

8e0736a

* Loosen GRPC requirements. * Change link to prevent throttling. * Change link to prevent throttling.

Merge pull request #37792: [ErrorProne] Fix AutoValueBoxedValues warn…

9829d6d

…ings across the codebase

update python container tag (#37812)

edaeae9

fix non-breaking vulnerabilities (#37826)

2ebe33d

Only pull license for selected tests and publishing container (#37827)

9d73d76

Fix 36181 cloudml benchmarks job (#37803)

cf536ea

* Pin cloudml benchmark deps to avoid pip resolution-too-deep on Dataflow * Reduce Dataflow inactivity timeout risk for TFT CloudML benchmark * Tighten CloudML TFT benchmark requirements * focus fix on dependency bounds only

Skip intermediate python wheels on pull request trigger (#37832)

d6bc507

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync with open source how#118

sync with open source how#118
lesterhaynes wants to merge 8430 commits intolinkedin:li_trunkfrom
apache:master

lesterhaynes commented Mar 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

lesterhaynes commented Mar 14, 2024

GitHub Actions Tests Status (on master branch)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants