Skip to content

Conversation

@douenergy
Copy link
Contributor

@douenergy douenergy commented Oct 8, 2025

The current Redshift metadata query uses information_schema.tables and information_schema.columns, which only shows schemas that are in the user's search_path or owned by the user. This causes visibility issues like Canner/WrenAI#1953

Summary by CodeRabbit

  • Bug Fixes

    • More accurate Redshift metadata: corrected nullability and data type reporting; improved table/column comments.
    • Excludes system/internal schemas and dropped/system columns; respects namespace privileges.
    • Consistent ordering by schema, table, then column; better handling of views alongside tables.
  • Performance

    • Faster and more reliable retrieval of Redshift table and column metadata.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Oct 8, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Walkthrough

The Redshift metadata query in get_table_list was replaced: it now queries PostgreSQL catalog tables (pg_class/pg_namespace/pg_attribute) and uses format_type/current_database() for types and catalogs, with revised filters, nullability logic, and ordering. External interface and Table/Column construction remain unchanged.

Changes

Cohort / File(s) Summary of Changes
Redshift metadata query rewrite
ibis-server/app/model/metadata/redshift.py
Replaced information_schema-based SQL with a catalog-driven query against pg_class/pg_namespace/pg_attribute; selected fields now use current_database()/nspname/relname/attname/format_type; nullability moved to a CASE on attnotnull; filters restrict relkind to 'r' and 'v', exclude system schemas/columns, exclude dropped columns, require USAGE on namespace, and order by schema, table, column position; existing helper transforms and Table/Column construction preserved.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Metadata as RedshiftMetadata
  participant Conn as SQLConnector
  participant Redshift as Redshift

  Client->>Metadata: get_table_list()
  Metadata->>Conn: execute(catalog-based SQL)
  Conn->>Redshift: SELECT from pg_namespace/pg_class/pg_attribute
  Redshift-->>Conn: rows (catalog, schema, table, column, type, nullable, comment)
  Conn-->>Metadata: result set
  Note right of Metadata: Map rows → Table/Column objects<br/>Use helpers: _format_redshift_compact_table_name, _transform_redshift_column_type
  Metadata-->>Client: list[Table]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Pay attention to the new SQL correctness and edge cases:
    • type extraction via format_type(...) and its parsing in _transform_redshift_column_type
    • nullability logic change from information_schema to attnotnull CASE expression
    • privilege check (has_schema_privilege(nsp.oid, 'USAGE')) and schema exclusions
    • handling of system/dropped columns and ordering semantics

Poem

I thump the logs and sniff each name,
From pg_class I learn the game.
Columns, types, and schemas bright—
I hop through catalogs all night.
A carrot cheer for metadata’s light. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: querying Redshift system catalogs instead of information_schema to show full table metadata, which directly addresses the PR's core objective.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 923d6a7 and 7d1b853.

📒 Files selected for processing (1)
  • ibis-server/app/model/metadata/redshift.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/model/metadata/redshift.py (1)

114-153: Use pg_catalog for foreign keys in get_constraints

get_constraints still relies on information_schema, which has the same schema-visibility limitations that led to pg_catalog in get_table_list. Confirm whether Redshift’s pg_catalog.pg_constraint (joined with pg_class and pg_attribute) exposes foreign key metadata, and refactor get_constraints to use it for consistency and full coverage.

@douenergy douenergy marked this pull request as draft October 13, 2025 01:58
@goldmedal
Copy link
Contributor

@douenergy, is this PR tested well?

@douenergy douenergy force-pushed the redshift-full-metadata branch from 7d1b853 to 218ebfa Compare November 19, 2025 05:13
@douenergy
Copy link
Contributor Author

Oracle testing failures are not related to this PR. I have tested it manually on my local environment.

image

@douenergy douenergy marked this pull request as ready for review November 19, 2025 05:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ibis-server/app/model/metadata/redshift.py (1)

66-66: format_type output breaks REDSHIFT_TYPE_MAPPING for length/precision-qualified types.

format_type(a.atttypid, a.atttypmod) returns strings like "character varying(100)" or "numeric(10,2)", which will not match the base-type keys in REDSHIFT_TYPE_MAPPING, so those columns will resolve to UNKNOWN and generate warnings instead of the expected mapped types. Normalize data_type by stripping modifiers before lookup.

You can fix this inside _transform_redshift_column_type:

     def _transform_redshift_column_type(
         self, data_type: str
     ) -> RustWrenEngineColumnType:
@@
-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
-
-        # Use the module-level mapping table
-        mapped_type = REDSHIFT_TYPE_MAPPING.get(
-            normalized_type, RustWrenEngineColumnType.UNKNOWN
-        )
+        # Convert to lowercase and strip type modifiers (e.g., "varchar(100)" -> "varchar")
+        normalized_type = data_type.lower()
+        # Remove everything after the first parenthesis or bracket
+        if "(" in normalized_type:
+            normalized_type = normalized_type.split("(", 1)[0].strip()
+        elif "[" in normalized_type:
+            normalized_type = normalized_type.split("[", 1)[0].strip()
+
+        # Use the module-level mapping table
+        mapped_type = REDSHIFT_TYPE_MAPPING.get(
+            normalized_type, RustWrenEngineColumnType.UNKNOWN
+        )

This is the same underlying issue noted in the earlier review and still appears unresolved.

Also applies to: 166-188

🧹 Nitpick comments (2)
ibis-server/app/model/metadata/redshift.py (2)

61-79: Catalog-based table/column query looks sound; verify desired object coverage.

The pg_class/pg_namespace/pg_attribute join, filters, and ordering should give a consistent view of regular tables and views with correct nullability and comments. If you also want materialized views or other relation kinds exposed in metadata, consider extending c.relkind IN ('r', 'v') accordingly once you’ve confirmed the relkind values used by Redshift in your environment.


114-153: Constraints still use information_schema; visibility may lag the new catalog-based table list.

get_constraints continues to query information_schema.*; if Redshift enforces the same search_path/ownership restrictions there, you might list tables from schemas whose foreign keys never appear. Consider moving constraints to a catalog-based query as well or at least confirming that the current views expose all constraints you expect after this change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1b853 and 218ebfa.

📒 Files selected for processing (1)
  • ibis-server/app/model/metadata/redshift.py (1 hunks)

@goldmedal
Copy link
Contributor

@douenergy did you check if a user with a lower permission could get the tables that he shouldn't see through the pg_catalog path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants