Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,11 @@ This file documents any backwards-incompatible changes in DataHub and assists pe

### Deprecations

### Breaking Changes

### Other Notable Changes

- #15118: (Ingestion) The Oracle source now includes stored procedures, functions, packages, and materialized views with automatic lineage generation. Use `procedure_pattern` to filter procedures if needed. See the Oracle source documentation for permissions and configuration details.
- #14717: The Tableau ingestion source now enables `extract_lineage_from_unsupported_custom_sql_queries` by default. This improves the quality of lineage extracted by using DataHub's SQL parser in cases where the Tableau Catalog API fails to return lineage for Custom SQL queries.
- #14824: DataHub now supports CDC (Change Data Capture) mode for generating MetadataChangeLogs with guaranteed ordering based on database transaction commits. CDC mode is optional and disabled by default. When enabled via `CDC_MCL_PROCESSING_ENABLED=true`, MCLs are generated from Debezium-captured database changes rather than directly from GMS. This provides stronger ordering guarantees and decoupled processing. Requires MySQL 5.7+ or PostgreSQL 10+ with replication enabled. See [CDC Configuration Guide](configure-cdc.md) for setup instructions.
- Added multi-client search engine shim for Elasticsearch and OpenSearch support. This enables DataHub to work with ES 7.17 (with API compatibility mode for ES 8.x servers), ES 8.x, and OpenSearch 2.x through a unified interface. The shim includes auto-detection of search engine types and backward compatibility with existing RestHighLevelClient usage. See [elasticsearch-search-client-shim.md](./elasticsearch-search-client-shim.md) for configuration details.
Expand Down
13 changes: 12 additions & 1 deletion metadata-ingestion/docs/sources/oracle/oracle.md
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@
As a SQL-based service, the Oracle integration is also supported by our SQL profiler. See here for more details on configuration.
The Oracle source extracts metadata from Oracle databases, including:

- **Tables and Views**: Standard relational tables and views with column information, constraints, and comments
- **Stored Procedures**: Functions, procedures, and packages with source code, arguments, and dependency tracking
- **Materialized Views**: Materialized views with proper lineage and refresh information
- **Lineage**: Automatic lineage generation from stored procedure definitions and materialized view queries via SQL parsing
- **Usage Statistics**: Query execution statistics and table access patterns (when audit data is available)
- **Operations**: Data modification events (CREATE, INSERT, UPDATE, DELETE) from audit trail data

The connector uses the `python-oracledb` driver and supports both thin mode (default, no Oracle client required) and thick mode (requires Oracle client installation).

As a SQL-based service, the Oracle integration is also supported by our SQL profiler for table and column statistics.
63 changes: 63 additions & 0 deletions metadata-ingestion/docs/sources/oracle/oracle_pre.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,71 @@ The following table contains a brief description of what each data dictionary vi
| `ALL_CONSTRAINTS` or `DBA_CONSTRAINTS` | Get constraint definitions on tables |
| `ALL_CONS_COLUMNS` or `DBA_CONS_COLUMNS` | Get list of columns that are specified in constraints |
| `ALL_USERS` or `DBA_USERS` | Get all schema names |
| `ALL_OBJECTS` or `DBA_OBJECTS` | Get stored procedures, functions, and packages |
| `ALL_SOURCE` or `DBA_SOURCE` | Get source code for stored procedures and functions |
| `ALL_ARGUMENTS` or `DBA_ARGUMENTS` | Get arguments for stored procedures and functions |
| `ALL_DEPENDENCIES` or `DBA_DEPENDENCIES` | Get dependency information for database objects |
| `ALL_MVIEWS` or `DBA_MVIEWS` | Get materialized views and their definitions |

#### Data Dictionary Views accessible information and required privileges

- `ALL_` views display all the information accessible to the user used for ingestion, including information from the user's schema as well as information from objects in other schemas, if the user has access to those objects by way of grants of privileges or roles.
- `DBA_` views display all relevant information in the entire database. They can be queried only by users with the `SYSDBA` system privilege or `SELECT ANY DICTIONARY` privilege, or `SELECT_CATALOG_ROLE` role, or by users with direct privileges granted to them.

#### Required Permissions

The following permissions are required based on features used:

**Basic Metadata (Tables & Views)**

```sql
-- Using data_dictionary_mode: ALL (default)
GRANT SELECT ON ALL_TABLES TO datahub_user;
GRANT SELECT ON ALL_TAB_COLS TO datahub_user;
GRANT SELECT ON ALL_TAB_COMMENTS TO datahub_user;
GRANT SELECT ON ALL_COL_COMMENTS TO datahub_user;
GRANT SELECT ON ALL_VIEWS TO datahub_user;
GRANT SELECT ON ALL_CONSTRAINTS TO datahub_user;
GRANT SELECT ON ALL_CONS_COLUMNS TO datahub_user;

-- Using data_dictionary_mode: DBA (elevated permissions)
GRANT SELECT ON DBA_TABLES TO datahub_user;
GRANT SELECT ON DBA_TAB_COLS TO datahub_user;
GRANT SELECT ON DBA_TAB_COMMENTS TO datahub_user;
GRANT SELECT ON DBA_COL_COMMENTS TO datahub_user;
GRANT SELECT ON DBA_VIEWS TO datahub_user;
GRANT SELECT ON DBA_CONSTRAINTS TO datahub_user;
GRANT SELECT ON DBA_CONS_COLUMNS TO datahub_user;
```

**Stored Procedures (enabled by default)**

```sql
-- For ALL mode
GRANT SELECT ON ALL_OBJECTS TO datahub_user;
GRANT SELECT ON ALL_SOURCE TO datahub_user;
GRANT SELECT ON ALL_ARGUMENTS TO datahub_user;
GRANT SELECT ON ALL_DEPENDENCIES TO datahub_user;

-- For DBA mode
GRANT SELECT ON DBA_OBJECTS TO datahub_user;
GRANT SELECT ON DBA_SOURCE TO datahub_user;
GRANT SELECT ON DBA_ARGUMENTS TO datahub_user;
GRANT SELECT ON DBA_DEPENDENCIES TO datahub_user;
```

**Materialized Views (enabled by default)**

```sql
-- For ALL mode
GRANT SELECT ON ALL_MVIEWS TO datahub_user;

-- For DBA mode
GRANT SELECT ON DBA_MVIEWS TO datahub_user;
```

**Database Name Resolution**

```sql
GRANT SELECT ON V_$DATABASE TO datahub_user;
```
24 changes: 23 additions & 1 deletion metadata-ingestion/docs/sources/oracle/oracle_recipe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ source:
type: oracle
config:
# Coordinates
host_port: localhost:5432
host_port: localhost:1521
database: dbname

# Credentials
Expand All @@ -11,6 +11,28 @@ source:

# Options
service_name: svc # omit database if using this option

# Data Dictionary Mode
data_dictionary_mode: "ALL" # or "DBA" for full database access

# Stored Procedures
include_stored_procedures: true
procedure_pattern:
allow:
- "SCHEMA.*" # Include all procedures from SCHEMA
deny:
- "SYS.*" # Exclude system procedures

# Materialized Views
include_materialized_views: true

# Usage and Operations (requires audit data or query logs)
include_usage_stats: true
include_operational_stats: true

# Oracle Client Configuration (optional)
enable_thick_mode: false # Set to true to use Oracle thick client
# thick_mode_lib_dir: "/path/to/oracle/client" # Required on Mac/Windows if thick mode enabled

sink:
# sink configs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"generated_at": "2025-10-23T14:26:01.879155+00:00",
"generated_at": "2025-10-27T17:35:54.648985+00:00",
"generated_by": "metadata-ingestion/scripts/capability_summary.py",
"plugin_details": {
"abs": {
Expand Down Expand Up @@ -2130,12 +2130,19 @@
},
{
"capability": "LINEAGE_FINE",
"description": "Enabled by default to get lineage for views via `include_view_column_lineage`",
"description": "Enabled by default to get lineage for stored procedures via `include_lineage` and for views via `include_view_column_lineage`",
"subtype_modifier": [
"Stored Procedure",
"View"
],
"supported": true
},
{
"capability": "USAGE_STATS",
"description": "Enabled by default via SQL aggregator when processing observed queries",
"subtype_modifier": null,
"supported": true
},
{
"capability": "DESCRIPTIONS",
"description": "Enabled by default",
Expand All @@ -2162,8 +2169,9 @@
},
{
"capability": "LINEAGE_COARSE",
"description": "Enabled by default to get lineage for views via `include_view_lineage`",
"description": "Enabled by default to get lineage for stored procedures via `include_lineage` and for views via `include_view_lineage`",
"subtype_modifier": [
"Stored Procedure",
"View"
],
"supported": true
Expand Down
Loading
Loading