Skip to content

Feature/dbt macros#85

Open
littlemex wants to merge 4 commits intomainfrom
feature/dbt-macros
Open

Feature/dbt macros#85
littlemex wants to merge 4 commits intomainfrom
feature/dbt-macros

Conversation

@littlemex
Copy link
Copy Markdown
Owner

No description provided.

@amazon-q-developer
Copy link
Copy Markdown

Code review in progress. Analyzing for code quality issues and best practices. You can monitor the review status in the checks section at the bottom of this pull request. Detailed findings will be posted upon completion.

Using Amazon Q Developer for GitHub

Amazon Q Developer1 is an AI-powered assistant that integrates directly into your GitHub workflow, enhancing your development process with intelligent features for code development, review, and transformation.

Slash Commands

Command Description
/q <message> Chat with the agent to ask questions or request revisions
/q review Requests an Amazon Q powered code review
/q help Displays usage information

Features

Agentic Chat
Enables interactive conversation with Amazon Q to ask questions about the pull request or request specific revisions. Use /q <message> in comment threads or the review body to engage with the agent directly.

Code Review
Analyzes pull requests for code quality, potential issues, and security concerns. Provides feedback and suggested fixes. Automatically triggered on new or reopened PRs (can be disabled for AWS registered installations), or manually with /q review slash command in a comment.

Customization

You can create project-specific rules for Amazon Q Developer to follow:

  1. Create a .amazonq/rules folder in your project root.
  2. Add Markdown files in this folder to define rules (e.g., cdk-rules.md).
  3. Write detailed prompts in these files, such as coding standards or best practices.
  4. Amazon Q Developer will automatically use these rules when generating code or providing assistance.

Example rule:

All Amazon S3 buckets must have encryption enabled, enforce SSL, and block public access.
All Amazon DynamoDB Streams tables must have encryption enabled.
All Amazon SNS topics must have encryption enabled and enforce SSL.
All Amazon SNS queues must enforce SSL.

Feedback

To provide feedback on Amazon Q Developer, create an issue in the Amazon Q Developer public repository.

For more detailed information, visit the Amazon Q for GitHub documentation.

Footnotes

  1. Amazon Q Developer uses generative AI. You may need to verify generated code before using it in your environment. See the AWS Responsible AI Policy.

Copy link
Copy Markdown

@amazon-q-developer amazon-q-developer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Feature Review: dbt Macros Enhancement

This PR introduces significant enhancements to the multitenant analytics platform, adding comprehensive dbt macro support and local development capabilities. The changes demonstrate a well-thought-out approach to scaling multi-tenant data processing.

Strengths

  1. Excellent Local Development Support: The addition of --local flags across all ETL managers provides a valuable development environment using Docker Compose
  2. Comprehensive dbt Macro System: The new macros for dynamic tenant schema detection and batch processing show good architectural thinking for scalability
  3. Consistent Script Renaming: The migration from phase-specific script names to more descriptive names (e.g., aurora-sql-execute.sh) improves maintainability
  4. Robust Configuration Structure: The enhanced config.json with separate remote/local configurations supports multiple deployment scenarios

🔧 Areas for Improvement

  1. Security Concerns: Hardcoded credentials in config.json pose security risks and should use environment variables

  2. Code Quality Issues:

    • SQL generation logic in dbt macros has structural problems that could cause compilation errors
    • Hardcoded container names create tight coupling
    • Missing error handling for edge cases (empty tenant lists)
  3. Configuration Inconsistencies: Mixed references to Glue and PostgreSQL/Redshift adapters could cause confusion

🎯 Recommendations

  1. Priority 1 (Security): Replace hardcoded credentials with environment variable references
  2. Priority 2 (Functionality): Fix the SQL generation logic in union_zero_etl_tenant_tables macro to prevent compilation errors
  3. Priority 3 (Robustness): Add proper error handling for empty tenant schema scenarios

📊 Impact Assessment

  • Positive: Significantly improves developer experience and system scalability
  • Risk: Some implementation issues could cause runtime failures if not addressed
  • Complexity: Adds substantial functionality while maintaining reasonable code organization

The PR represents a major step forward in making the platform more developer-friendly and scalable. With the identified issues addressed, this will be a valuable addition to the codebase.


Coming soon: You'll be able to request the agent to implement changes and create commits on your pull request's source branch. This may trigger any configured CI/CD workflows automatically.

You can disable any CI/CD workflow that you have configured to run on the source branch of the pull request. For more information, see Disabling and enabling a workflow in the GitHub documentation.

echo " ./2-etl-manager.sh -p $PATTERN -c $CONFIG_FILE --local"
echo ""
echo "Or test local PostgreSQL connection:"
echo " docker exec -it multitenant-analytics-platform-postgres-1 psql -U dbt_user -d postgres"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition of local execution support! The --local flag provides a valuable development environment option. However, there's a potential issue with the hardcoded container name in the local environment info.

Suggested change
echo " docker exec -it multitenant-analytics-platform-postgres-1 psql -U dbt_user -d postgres"
echo " docker exec -it \$(docker compose ps -q postgres) psql -U dbt_user -d postgres"

# Function to transfer files to Docker container based on config.json
transfer_files_to_docker_container() {
local config_file="$1"
local container_name="multitenant-analytics-platform-dbt-local-1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded container name creates a tight coupling and potential brittleness. Consider making this configurable or using dynamic container discovery.

Suggested change
local container_name="multitenant-analytics-platform-dbt-local-1"
local container_name=$(docker compose ps -q dbt-local 2>/dev/null | head -1)
if [[ -z "$container_name" ]]; then
container_name="multitenant-analytics-platform-dbt-local-1" # fallback
fi


# Verify transfer
print_info "Verifying transferred files in Docker container..."
local verify_output=$(docker exec "$container_name" bash -c "cd $container_path && echo 'VERIFY: Current directory:' && pwd && echo 'VERIFY: Directory contents:' && ls -la && echo 'VERIFY: Key files check:' && if [ -f config.json ]; then echo 'VERIFY: config.json exists'; else echo 'VERIFY: config.json MISSING'; fi && if [ -f scripts/aurora-sql-execute.sh ]; then echo 'VERIFY: scripts/aurora-sql-execute.sh exists'; else echo 'VERIFY: scripts/auora.sh MISSING'; fi && if [ -f sql/aurora/schema/create-tenant-schemas.sql ]; then echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql exists'; else echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql MISSING'; fi" 2>/dev/null)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the verification message that could cause confusion during debugging.

Suggested change
local verify_output=$(docker exec "$container_name" bash -c "cd $container_path && echo 'VERIFY: Current directory:' && pwd && echo 'VERIFY: Directory contents:' && ls -la && echo 'VERIFY: Key files check:' && if [ -f config.json ]; then echo 'VERIFY: config.json exists'; else echo 'VERIFY: config.json MISSING'; fi && if [ -f scripts/aurora-sql-execute.sh ]; then echo 'VERIFY: scripts/aurora-sql-execute.sh exists'; else echo 'VERIFY: scripts/auora.sh MISSING'; fi && if [ -f sql/aurora/schema/create-tenant-schemas.sql ]; then echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql exists'; else echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql MISSING'; fi" 2>/dev/null)
local verify_output=$(docker exec "$container_name" bash -c "cd $container_path && echo 'VERIFY: Current directory:' && pwd && echo 'VERIFY: Directory contents:' && ls -la && echo 'VERIFY: Key files check:' && if [ -f config.json ]; then echo 'VERIFY: config.json exists'; else echo 'VERIFY: config.json MISSING'; fi && if [ -f scripts/aurora-sql-execute.sh ]; then echo 'VERIFY: scripts/aurora-sql-execute.sh exists'; else echo 'VERIFY: scripts/aurora-sql-execute.sh MISSING'; fi && if [ -f sql/aurora/schema/create-tenant-schemas.sql ]; then echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql exists'; else echo 'VERIFY: sql/aurora/schema/create-tenant-schemas.sql MISSING'; fi" 2>/dev/null)

Comment on lines +139 to +142
{% if sample_tenant is none %}
{% set tenant_schemas = get_zero_etl_tenant_schemas(zeroetl_database) %}
{% set sample_tenant = tenant_schemas[0] %}
{% endif %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Potential Runtime Error: This code assumes that tenant_schemas will always have at least one element, but if get_zero_etl_tenant_schemas() returns an empty list, accessing tenant_schemas[0] will cause an index error.

Suggested change
{% if sample_tenant is none %}
{% set tenant_schemas = get_zero_etl_tenant_schemas(zeroetl_database) %}
{% set sample_tenant = tenant_schemas[0] %}
{% endif %}
{% if sample_tenant is none %}
{% set tenant_schemas = get_zero_etl_tenant_schemas(zeroetl_database) %}
{% if tenant_schemas|length > 0 %}
{% set sample_tenant = tenant_schemas[0] %}
{% else %}
{{ log("No tenant schemas found for column introspection", info=true) }}
{{ return([]) }}
{% endif %}
{% endif %}

Comment on lines +63 to +95
{% for batch in batches %}
{% if batches|length > 1 %}
{# バッチ処理時のCTEとして処理 #}
batch_{{ loop.index0 }} AS (
{% endif %}

{% for tenant_schema in batch %}
{% set table_ref = zeroetl_database ~ '.' ~ tenant_schema ~ '.' ~ table_name %}

SELECT
'{{ tenant_schema }}'::varchar(50) as tenant_id,
{{ select_columns }}
FROM {{ table_ref }}

{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}

{% if batches|length > 1 %}
){% if not loop.last %},{% endif %}
{% endif %}
{% endfor %}

{# バッチが複数ある場合は最終的にUNION #}
{% if batches|length > 1 %}
{% for batch in batches %}
SELECT * FROM batch_{{ loop.index0 }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% endif %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch processing logic has a structural issue. The current implementation generates invalid SQL when there are multiple batches because it creates CTEs inside a loop without proper WITH clause structure. This will cause SQL compilation errors.

Suggested change
{% for batch in batches %}
{% if batches|length > 1 %}
{# バッチ処理時のCTEとして処理 #}
batch_{{ loop.index0 }} AS (
{% endif %}
{% for tenant_schema in batch %}
{% set table_ref = zeroetl_database ~ '.' ~ tenant_schema ~ '.' ~ table_name %}
SELECT
'{{ tenant_schema }}'::varchar(50) as tenant_id,
{{ select_columns }}
FROM {{ table_ref }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% if batches|length > 1 %}
){% if not loop.last %},{% endif %}
{% endif %}
{% endfor %}
{# バッチが複数ある場合は最終的にUNION #}
{% if batches|length > 1 %}
{% for batch in batches %}
SELECT * FROM batch_{{ loop.index0 }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% endif %}
{% if batches|length > 1 %}
{# Multiple batches: use CTE structure #}
WITH
{% for batch in batches %}
batch_{{ loop.index0 }} AS (
{% for tenant_schema in batch %}
{% set table_ref = zeroetl_database ~ '.' ~ tenant_schema ~ '.' ~ table_name %}
SELECT
'{{ tenant_schema }}'::varchar(50) as tenant_id,
{{ select_columns }}
FROM {{ table_ref }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
){% if not loop.last %},{% endif %}
{% endfor %}
{# Final UNION of all batches #}
{% for batch in batches %}
SELECT * FROM batch_{{ loop.index0 }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% else %}
{# Single batch: direct UNION #}
{% for tenant_schema in batches[0] %}
{% set table_ref = zeroetl_database ~ '.' ~ tenant_schema ~ '.' ~ table_name %}
SELECT
'{{ tenant_schema }}'::varchar(50) as tenant_id,
{{ select_columns }}
FROM {{ table_ref }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% endif %}

Comment on lines +30 to +36
"local": {
"host": "localhost",
"port": 5432,
"database": "multitenant_analytics",
"username": "dbt_user",
"password": "dbt_password"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Security Vulnerability: Hardcoded credentials in configuration files pose a significant security risk. These credentials could be exposed in version control, logs, or during deployment processes.

Suggested change
"local": {
"host": "localhost",
"port": 5432,
"database": "multitenant_analytics",
"username": "dbt_user",
"password": "dbt_password"
},
"local": {
"host": "localhost",
"port": 5432,
"database": "multitenant_analytics",
"username": "${LOCAL_DB_USER:-dbt_user}",
"password": "${LOCAL_DB_PASSWORD:-dbt_password}"
},

+materialized: table
+tags: ["marts"]

# dbt-glue specific configurations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions "dbt-glue specific configurations" but the project appears to be using PostgreSQL/Redshift adapters based on the profiles and other configurations. This could cause confusion about the intended target platform.

Suggested change
# dbt-glue specific configurations
# Multi-tenant dbt configurations

Comment on lines +34 to +36
# Glue Interactive Sessions settings
glue_session_role: "dbt-glue-interactive-session-role"
glue_region: "us-east-1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Glue-specific configurations seem inconsistent with the PostgreSQL/Redshift setup used elsewhere in the project. Consider removing these if not using AWS Glue, or clarify the multi-adapter strategy.

Suggested change
# Glue Interactive Sessions settings
glue_session_role: "dbt-glue-interactive-session-role"
glue_region: "us-east-1"
# Multi-tenant configuration - 1000+テナント対応

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant