Skip to content

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840

Open
flaviofcruz wants to merge 2 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus
Open

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840
flaviofcruz wants to merge 2 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus

Conversation

@flaviofcruz
Copy link

@flaviofcruz flaviofcruz commented Mar 3, 2026

Summary

Databricks provides a Zerobus ingest connector [1], a push based API that writes data directly into Unity Catalog Delta tables. This PR introduces a new vector sink that integrates with Databricks, allowing Vector to push data into Databricks. We use the Databricks provided SDK to implement the sink [2].

We currently support two main APIs:

  • Row based ingestion: we ingest protocol buffers.
  • Column based ingestion: we ingest arrow batches. This is similar to what the clickhouse sink does.

With row based ingestion, we extended the BatchSerializerConfig to support a batch serializer that creates vector's of protocol buffer bytes. This makes it the second option for doing batch serialization, along arrow batch.

In both situations, we allow users to either specify the schema of the target table explicitly through a protocol buffer descriptor or by using Unity Catalog to dynamically fetch the schema.

Vector configuration

[sinks.databricks_zerobus]
type = "databricks_zerobus"
inputs = ["logs"]
ingestion_endpoint = "https://91041497925470.zerobus.us-west-2.cloud.databricks.com"
table_name = "main.default.zerobus_table"
unity_catalog_endpoint = "https://logfood-us-west-2-mt.cloud.databricks.com/"
[sinks.databricks_zerobus.schema]
path = "example_proto_desc.pb"
type = "path"
message_type = "package.ZerobusExample"
[sinks.databricks_zerobus.auth]
strategy = "oauth"
client_id = "<client id>"
client_secret = "<secret>"

How did you test this PR?

Unit tests, running small toy examples and using it in production for actual traffic.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

[1] https://docs.databricks.com/aws/en/ingestion/zerobus-overview
[2] https://github.com/databricks/zerobus-sdk

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Mar 3, 2026
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for ingesting da… feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion Mar 3, 2026
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch 2 times, most recently from ebb9b8d to 2368e4a Compare March 12, 2026 16:57
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 2368e4a to 42bf043 Compare March 12, 2026 17:04
@flaviofcruz flaviofcruz marked this pull request as ready for review March 12, 2026 17:05
@flaviofcruz flaviofcruz requested review from a team as code owners March 12, 2026 17:05
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion feat(sinks): add new databricks_zerobus for Databricks ingestion Mar 12, 2026
@github-actions github-actions bot added the domain: ci Anything related to Vector's CI environment label Mar 12, 2026
@drichards-87 drichards-87 self-assigned this Mar 12, 2026
@drichards-87 drichards-87 removed their assignment Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants