Skip to content

Add UseQueryForMetadata connection property for Thrift metadata operations#1242

Open
gopalldb wants to merge 1 commit intodatabricks:mainfrom
gopalldb:thrift-meta
Open

Add UseQueryForMetadata connection property for Thrift metadata operations#1242
gopalldb wants to merge 1 commit intodatabricks:mainfrom
gopalldb:thrift-meta

Conversation

@gopalldb
Copy link
Collaborator

@gopalldb gopalldb commented Mar 2, 2026

Summary

  • Add opt-in connection property UseQueryForMetadata that makes the Thrift path use SQL SHOW commands (via DatabricksMetadataQueryClient) instead of native Thrift RPCs for metadata operations
  • Rename DatabricksMetadataSdkClientDatabricksMetadataQueryClient and internal field sdkClientqueryExecutionClient since the class now serves both SEA and Thrift paths

Problem

Thrift metadata operations (GetTables, GetSchemas, GetColumns, etc.) pass filter patterns directly to the server, which treats _ as a single-character wildcard. This causes incorrect results — e.g., querying for catalog a_b also returns axb, a1b, etc.

The SEA metadata path already solved this by using SQL SHOW commands with LIKE patterns (via CommandBuilder + WildcardUtil), which handle _ correctly.

Solution

When UseQueryForMetadata=1 and the client type is THRIFT, DatabricksSession creates a DatabricksMetadataQueryClient wrapping the Thrift IDatabricksClient. This reuses the entire SEA metadata implementation — no code duplication needed — because DatabricksMetadataQueryClient only calls executeStatement() and getConnectionContext() on the underlying client, both of which DatabricksThriftServiceClient already implements.

How it works

DatabricksSession.getDatabricksMetadataClient():

  • THRIFT + UseQueryForMetadata=0 (default): returns the Thrift client cast to IDatabricksMetadataClient (unchanged behavior)
  • THRIFT + UseQueryForMetadata=1: returns a DatabricksMetadataQueryClient wrapping the Thrift client, which executes SHOW SQL commands for metadata
  • SEA: returns DatabricksMetadataQueryClient as before

The feature is also correctly wired in all SEA→Thrift fallback paths (temporary redirect and rate limit).

Changes

File Change
DatabricksJdbcUrlParams.java Add USE_QUERY_FOR_METADATA enum constant
IDatabricksConnectionContext.java Add useQueryForMetadata() interface method
DatabricksConnectionContext.java Implement useQueryForMetadata() accessor
DatabricksSession.java Create DatabricksMetadataQueryClient for Thrift when enabled; update getDatabricksMetadataClient() dispatch; handle SEA→Thrift fallback paths
DatabricksMetadataSdkClient.javaDatabricksMetadataQueryClient.java Rename class + rename sdkClient field → queryExecutionClient
DatabricksSessionTest.java Add tests for enabled/disabled dispatch behavior
DatabricksMetadataSdkClientTest.javaDatabricksMetadataQueryClientTest.java Rename test class
NEXT_CHANGELOG.md Add changelog entry

Backward Compatibility

  • Opt-in only: UseQueryForMetadata=0 by default — existing Thrift behavior is completely unchanged
  • Independent of EnableShowCommandForGetFunctions: that property continues to work separately
  • SEA path unchanged: DatabricksMetadataQueryClient is still used for SEA as before

Test plan

  • DatabricksSessionTest — 16/16 pass (includes 2 new tests verifying dispatch with UseQueryForMetadata=1 and default)
  • DatabricksMetadataQueryClientTest — 44/44 pass (all existing metadata tests pass with renamed class)
  • mvn spotless:check — clean
  • mvn clean install -DskipTests — builds successfully

🤖 Generated with Claude Code

…dataSdkClient

Add opt-in connection property `UseQueryForMetadata` that makes the Thrift path
use SQL SHOW commands (via DatabricksMetadataQueryClient) instead of native
Thrift RPCs for metadata operations. This fixes incorrect wildcard matching
where `_` was treated as a single-character wildcard in Thrift metadata filters.

Also rename DatabricksMetadataSdkClient to DatabricksMetadataQueryClient since
it is now used by both SEA and Thrift paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
@shkelzeen
Copy link

What is the performance implication here, with this change now you will get the schemas, tables, columns and more using compute resources would the latency depends on the utilization of the cluster and have you ran any benchmarks to see the difference on latency ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants