-
Notifications
You must be signed in to change notification settings - Fork 141
feat(ibis): Show full Redshift table metadata by querying system catalogs #1345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe Redshift metadata query in get_table_list was replaced: it now queries PostgreSQL catalog tables (pg_class/pg_namespace/pg_attribute) and uses format_type/current_database() for types and catalogs, with revised filters, nullability logic, and ordering. External interface and Table/Column construction remain unchanged. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Metadata as RedshiftMetadata
participant Conn as SQLConnector
participant Redshift as Redshift
Client->>Metadata: get_table_list()
Metadata->>Conn: execute(catalog-based SQL)
Conn->>Redshift: SELECT from pg_namespace/pg_class/pg_attribute
Redshift-->>Conn: rows (catalog, schema, table, column, type, nullable, comment)
Conn-->>Metadata: result set
Note right of Metadata: Map rows → Table/Column objects<br/>Use helpers: _format_redshift_compact_table_name, _transform_redshift_column_type
Metadata-->>Client: list[Table]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibis-server/app/model/metadata/redshift.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/model/metadata/redshift.py (1)
114-153: Use pg_catalog for foreign keys in get_constraintsget_constraints still relies on information_schema, which has the same schema-visibility limitations that led to pg_catalog in get_table_list. Confirm whether Redshift’s pg_catalog.pg_constraint (joined with pg_class and pg_attribute) exposes foreign key metadata, and refactor get_constraints to use it for consistency and full coverage.
|
@douenergy, is this PR tested well? |
7d1b853 to
218ebfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
ibis-server/app/model/metadata/redshift.py (1)
66-66: format_type output breaks REDSHIFT_TYPE_MAPPING for length/precision-qualified types.
format_type(a.atttypid, a.atttypmod)returns strings like"character varying(100)"or"numeric(10,2)", which will not match the base-type keys inREDSHIFT_TYPE_MAPPING, so those columns will resolve toUNKNOWNand generate warnings instead of the expected mapped types. Normalizedata_typeby stripping modifiers before lookup.You can fix this inside
_transform_redshift_column_type:def _transform_redshift_column_type( self, data_type: str ) -> RustWrenEngineColumnType: @@ - # Convert to lowercase for comparison - normalized_type = data_type.lower() - - # Use the module-level mapping table - mapped_type = REDSHIFT_TYPE_MAPPING.get( - normalized_type, RustWrenEngineColumnType.UNKNOWN - ) + # Convert to lowercase and strip type modifiers (e.g., "varchar(100)" -> "varchar") + normalized_type = data_type.lower() + # Remove everything after the first parenthesis or bracket + if "(" in normalized_type: + normalized_type = normalized_type.split("(", 1)[0].strip() + elif "[" in normalized_type: + normalized_type = normalized_type.split("[", 1)[0].strip() + + # Use the module-level mapping table + mapped_type = REDSHIFT_TYPE_MAPPING.get( + normalized_type, RustWrenEngineColumnType.UNKNOWN + )This is the same underlying issue noted in the earlier review and still appears unresolved.
Also applies to: 166-188
🧹 Nitpick comments (2)
ibis-server/app/model/metadata/redshift.py (2)
61-79: Catalog-based table/column query looks sound; verify desired object coverage.The pg_class/pg_namespace/pg_attribute join, filters, and ordering should give a consistent view of regular tables and views with correct nullability and comments. If you also want materialized views or other relation kinds exposed in metadata, consider extending
c.relkind IN ('r', 'v')accordingly once you’ve confirmed the relkind values used by Redshift in your environment.
114-153: Constraints still use information_schema; visibility may lag the new catalog-based table list.
get_constraintscontinues to queryinformation_schema.*; if Redshift enforces the same search_path/ownership restrictions there, you might list tables from schemas whose foreign keys never appear. Consider moving constraints to a catalog-based query as well or at least confirming that the current views expose all constraints you expect after this change.
|
@douenergy did you check if a user with a lower permission could get the tables that he shouldn't see through the pg_catalog path? |

The current Redshift metadata query uses information_schema.tables and information_schema.columns, which only shows schemas that are in the user's search_path or owned by the user. This causes visibility issues like Canner/WrenAI#1953
Summary by CodeRabbit
Bug Fixes
Performance