Skip to content

Commit 2f31635

Browse files
authored
feat(dbt): enhance DBTCloud integration with bulk job ingestion (#15264)
1 parent ee700c5 commit 2f31635

File tree

7 files changed

+1568
-31
lines changed

7 files changed

+1568
-31
lines changed

metadata-ingestion/docs/sources/dbt/dbt-cloud_pre.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,36 @@ This source pulls dbt metadata directly from the dbt Cloud APIs.
55
Create a [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens) with the "Metadata Only" permission.
66
This is a read-only permission.
77

8-
You'll need to have a dbt Cloud job set up to run your dbt project, and "Generate docs on run" should be enabled.
8+
#### Operating Modes
9+
10+
The dbt Cloud source supports two modes of operation:
11+
12+
##### 1. Explicit Mode (Default)
13+
14+
Specify a single dbt Cloud job to ingest metadata from. You'll need to have a dbt Cloud job set up to run your dbt project, and "Generate docs on run" should be enabled.
15+
16+
Note: As this is ingesting only one job, we expect it to process all/most of the models, or else multiple job ingestion might be required.
917

1018
To get the required IDs, go to the job details page (this is the one with the "Run History" table), and look at the URL.
1119
It should look something like this: https://cloud.getdbt.com/next/deploy/107298/projects/175705/jobs/148094.
1220
In this example, the account ID is 107298, the project ID is 175705, and the job ID is 148094.
21+
22+
##### 2. Auto-Discovery Mode
23+
24+
Automatically discovers and ingests metadata from all eligible jobs in a dbt Cloud project. This mode:
25+
26+
- Discovers all jobs in the specified project's **production environment only**
27+
- Filters to jobs with **"Generate docs on run" enabled** (`generate_docs=True`)
28+
- Always uses the **latest run** for each job (ignores `run_id` configuration)
29+
- Supports optional regex-based filtering to include/exclude specific job IDs
30+
- Ingests metadata from multiple jobs in a single run
31+
32+
**When to use auto-discovery:**
33+
34+
- You have multiple dbt Cloud jobs in a project and want to ingest all of them
35+
- You want to automatically pick up new jobs without updating configuration
36+
37+
**Requirements:**
38+
39+
- Jobs must be in the production environment
40+
- Jobs must have "Generate docs on run" enabled

metadata-ingestion/docs/sources/dbt/dbt-cloud_recipe.yml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,23 @@ source:
88

99
account_id: "${DBT_ACCOUNT_ID}" # set to your dbt cloud account id
1010
project_id: "${DBT_PROJECT_ID}" # set to your dbt cloud project id
11+
12+
# Mode 1: Explicit Mode (specify a single job)
1113
job_id: "${DBT_JOB_ID}" # set to your dbt cloud job id
12-
run_id: # set to your dbt cloud run id. This is optional, and defaults to the latest run
14+
run_id: # optional: set to a specific dbt cloud run id. Defaults to the latest run
1315

14-
target_platform: postgres
16+
# Mode 2: Auto-Discovery Mode (automatically discover all eligible jobs)
17+
# Uncomment the section below to enable auto-discovery
18+
# Note: When auto_discovery is enabled, job_id can be omitted (will be ignored if provided)
19+
# and run_id is ignored (always uses the latest run)
20+
# auto_discovery:
21+
# enabled: true
22+
# job_id_pattern: # optional
23+
# allow:
24+
# - ".*" # regex pattern to include jobs (default: include all)
25+
# # deny:
26+
# # - "test.*" # optional: regex pattern to exclude specific jobs
1527

16-
# Options
17-
target_platform: "${TARGET_PLATFORM_ID}" # e.g. bigquery/postgres/etc.
28+
target_platform: "${TARGET_PLATFORM_ID}" # e.g. bigquery/postgres/snowflake/etc.
1829

1930
# sink configs

0 commit comments

Comments
 (0)