Skip to content

Conversation

@Hexeong
Copy link
Contributor

@Hexeong Hexeong commented Dec 27, 2025

관련 이슈

작업 내용

  1. server의 docker-compose에서 분리해온 사이드 인프라(redis, redis-exporter, alloy)를 remote-SSH와 docker-compose로 실행하는 스크립트 파일 구현
  2. nginx setup을 userdata에서 분리하여 remote-SSH로 실행되게 수정
  3. 기존 setup을 실행하던 script 파일에 대해 적힌 한글 주석이 압축/해제되는 과정에서 깨지기에 영어 주석으로 바꿨습니다!

특이 사항

리뷰 요구사항 (선택)

@Hexeong Hexeong force-pushed the feat/6-integrate-side-infra branch from 2065e3a to b003435 Compare December 27, 2025 12:07
@coderabbitai
Copy link

coderabbitai bot commented Dec 28, 2025

Walkthrough

이 변경은 Terraform 변수 입력 확장, 모듈 변수 추가, EC2 프로비저닝 방식 변경(cloud-init에서 Docker 설치 분리 및 null_resource 트리거를 통한 원격 스크립트 실행 추가), 모니터링 서버 탐지용 데이터 소스 추가, 사이드 인프라(Redis, Redis Exporter, Grafana Alloy)용 구성 및 docker-compose 기반 배포 스크립트 추가, 그리고 Alloy/Loki 로그 수집 템플릿과 시크릿 서브모듈 포인터 업데이트를 포함합니다. 총체적으로 인프라 구성·프로비저닝과 사이드카(사이드 인프라) 배포 관련 파일들이 추가·수정되었습니다.

Changes

Cohort / File(s) Summary
Environment Configuration
environment/prod/main.tf, environment/prod/variables.tf, environment/stage/main.tf, environment/stage/variables.tf
prod 및 stage 환경에 ssh_key_path, work_dir, alloy_env_name, redis_version, redis_exporter_version, alloy_version 총 6개 변수 추가 및 모듈 호출에 전달.
Module Variable Declarations
modules/app_stack/variables.tf
app_stack 모듈에 상술한 6개 입력 변수 추가(SSH 키 경로, 작업 디렉터리, Alloy 환경명, Redis/Redis Exporter/Alloy 도커 이미지 태그).
EC2 & Orchestration
modules/app_stack/ec2.tf
모니터링 서버 탐지용 data 소스 추가, cloud-init 구성에서 Docker 설치 분리(Part1 유지, Nginx 부트스트랩 제거), null_resource 트리거 2개(update_nginx, update_side_infra) 추가하여 템플릿 렌더링·원격 실행 처리.
Provisioning Scripts
modules/app_stack/scripts/docker_setup.sh, modules/app_stack/scripts/nginx_setup.sh.tftpl, modules/app_stack/scripts/side_infra_setup.sh.tftpl
Docker 설치 스크립트 추가/갱신, Nginx+Certbot 설정 템플릿 개선, 새로운 사이드 인프라 배포 스크립트(Ally/Redis/Redis Exporter용 docker-compose 생성 및 재시작).
Security Groups
modules/app_stack/security_groups.tf
모니터링 EC2 데이터 소스 추가 및 API 보안그룹에 모니터링 인스턴스의 private IP에 대한 TCP 8081 인그레스 규칙 추가.
Logging & Monitoring Configuration
config/side-infra/config.alloy.tftpl
Spring 로그 수집, Loki 소스/프로세싱/쓰기 구성 포함한 Alloy 템플릿 추가(배치 제어, 정적 라벨 주입 등).
Submodule Update
config/secrets
시크릿 서브모듈의 커밋 참조 업데이트(메타데이터 포인터 변경).
Module Outputs / S3
modules/app_stack/s3.tf, modules/app_stack/outputs.tf, modules/app_stack/variables.tf
앱 스택 관련 S3 및 출력/변수 파일 추가·정리(새 변수들 포함).

Sequence Diagram(s)

sequenceDiagram
    participant EC2 as EC2 인스턴스
    participant CloudInit as Cloud-init
    participant Trigger as Null Resource<br/>(트리거)
    participant Scripts as 설정 스크립트
    participant Services as 서비스<br/>(Docker)
    participant Monitor as 모니터링 서버

    EC2->>CloudInit: 인스턴스 초기화
    CloudInit->>Scripts: docker_setup.sh 실행 (Part 1)
    Scripts->>Services: Docker 엔진 설치
    Services-->>EC2: Docker 준비 완료

    Trigger->>Trigger: nginx 템플릿 변경 감지
    Trigger->>Scripts: nginx_setup.sh.tftpl 실행
    Scripts->>Services: Nginx + Certbot 설치 및 설정
    Services-->>EC2: Nginx 준비 완료

    Trigger->>Trigger: side_infra 템플릿 변경 감지
    EC2->>Monitor: 모니터링 서버 IP 조회
    Monitor-->>EC2: 모니터링 IP 반환
    Trigger->>Scripts: side_infra_setup.sh.tftpl 실행
    Scripts->>Scripts: Alloy 구성 렌더링(모니터링 IP 주입)
    Scripts->>Services: docker-compose로 redis 시작
    Scripts->>Services: redis-exporter 시작
    Scripts->>Services: alloy 시작 (로그 에이전트)
    Services->>Monitor: Alloy가 Loki로 로그 전송
    Monitor-->>Services: 로그 수집 확인
    Services-->>EC2: 사이드 인프라 준비 완료
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Redis가 뛰고 Alloy가 날아오르네,
Nginx는 TLS로 우뚝 서 있네,
모니터링 로그는 흐르고 정돈되어,
사이드 인프라의 밤은 빛나리! ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed PR 제목이 주요 변경사항을 명확하게 요약하고 있으며, 사이드 인프라를 Infra 레포지토리에 통합하는 것이 핵심 내용입니다.
Description check ✅ Passed PR 설명이 관련 이슈 번호, 작업 내용 3가지를 체계적으로 기술하였으나, 특이 사항과 리뷰 요구사항 섹션이 비어있습니다.
Linked Issues check ✅ Passed PR이 #6 이슈의 모든 요구사항을 충족합니다: 사이드 인프라(Redis, Redis-Exporter, Alloy)를 Terraform으로 관리하고, 자동 배포 및 구성을 제공합니다.
Out of Scope Changes check ✅ Passed 모든 코드 변경사항이 이슈 #6의 범위 내에 있으며, 사이드 인프라 분리, 원격 실행 스크립트 구현, 주석 정정이 통합됩니다.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/6-integrate-side-infra

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between deb153f and 18a401e.

📒 Files selected for processing (1)
  • modules/app_stack/security_groups.tf
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-29T14:32:47.078Z
Learnt from: Hexeong
Repo: solid-connection/solid-connection-infra PR: 11
File: modules/app_stack/ec2.tf:1-11
Timestamp: 2025-12-29T14:32:47.078Z
Learning: In Terraform, the data source aws_instance (singular) will fail at plan/apply if zero or more than one match is found. Do not add extra validation to enforce a single match; rely on Terraform's built-in behavior. If a single specific instance is required, use a more precise data source filter or reference the instance by ID in the configuration.

Applied to files:

  • modules/app_stack/security_groups.tf
🔇 Additional comments (1)
modules/app_stack/security_groups.tf (1)

30-37: 모니터링 인스턴스 IP 변경 시 업데이트 필요

포트 8081에 대한 ingress 규칙이 모니터링 인스턴스의 private IP를 하드코딩하고 있습니다. 이는 Lines 1-11의 데이터 소스에서 가져온 값이므로, 만약 모니터링 인스턴스가 재생성되거나 IP가 변경되면 다음 terraform apply를 실행하기 전까지 보안 그룹 규칙이 업데이트되지 않습니다.

모니터링 인스턴스의 라이프사이클이 이 스택과 독립적으로 관리되는 경우, IP 변경 후 보안 그룹 규칙이 일시적으로 잘못된 IP를 참조할 수 있다는 점을 문서화하거나, 모니터링 인스턴스에 Elastic IP를 할당하는 방안을 고려해보세요.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (12)
modules/app_stack/variables.tf (1)

118-148: Add validation rules for the new variables.

The new variables lack validation rules, which could lead to runtime errors when invalid values are provided. Consider adding validation for:

  • ssh_key_path: Validate the path exists and is readable
  • work_dir: Validate it's a valid directory path format
  • Version variables (redis_version, redis_exporter_version, alloy_version): Consider validating the format matches expected Docker tag patterns
🔎 Example validation rules
 variable "ssh_key_path" {
   description = "Path to the SSH private key file for remote-exec"
   type        = string
+
+  validation {
+    condition     = can(file(var.ssh_key_path))
+    error_message = "The ssh_key_path must be a valid file path."
+  }
 }

 variable "redis_version" {
   description = "Docker image tag for Redis"
   type        = string
+
+  validation {
+    condition     = can(regex("^[0-9]+\\.[0-9]+\\.[0-9]+$", var.redis_version))
+    error_message = "Redis version must follow semantic versioning (e.g., 7.2.0)."
+  }
 }
config/side-infra/config.alloy.tftpl (1)

11-11: Consider making the log path configurable.

The log path is hardcoded to /var/log/spring/*.log. If different environments or deployments need different log locations, consider making this a template variable.

🔎 Example parameterization
 local.file_match "spring_logs" {
-  path_targets = [{ __path__ = "/var/log/spring/*.log" }]  // 서비스 로그 파일 경로
+  path_targets = [{ __path__ = "${log_path}" }]
 }

Then pass log_path as a template variable when rendering this file.

modules/app_stack/scripts/docker_setup.sh (1)

1-27: Add idempotency checks and error handling.

The script lacks:

  1. Checks for existing Docker installation (will fail if Docker is already installed)
  2. Explicit error handling (set -e or set -euo pipefail)
  3. Retry logic for apt operations

These improvements would make the script more robust for re-runs and handle transient failures.

🔎 Suggested improvements
 #!/bin/bash
+set -euo pipefail
+
+# Check if Docker is already installed
+if command -v docker &> /dev/null; then
+  echo "Docker is already installed. Skipping installation."
+  exit 0
+fi

 # 1. 필수 패키지 설치
 apt-get update
 apt-get install -y ca-certificates curl gnupg lsb-release
modules/app_stack/scripts/nginx_setup.sh.tftpl (2)

6-8: Add validation for template variables.

The script doesn't validate that the template variables (domain_name, email, conf_file_name) are non-empty or properly formatted. Invalid values could cause the script to fail in unexpected ways.

🔎 Add input validation
 # --- variables setting ---
 DOMAIN="${domain_name}"
 EMAIL="${email}"
 CONF_NAME="${conf_file_name}"
+
+# Validate required variables
+if [[ -z "$DOMAIN" ]] || [[ -z "$EMAIL" ]] || [[ -z "$CONF_NAME" ]]; then
+  echo "Error: Required variables are not set"
+  exit 1
+fi

22-31: Make certificate issuance idempotent.

The script will fail if the SSL certificate already exists for the domain. Consider checking if the certificate exists before attempting to issue a new one.

🔎 Add certificate existence check
 # 3. Issue SSL certificate (Non-interactive mode)
+
+# Check if certificate already exists
+if [ -d "/etc/letsencrypt/live/$DOMAIN" ]; then
+  echo "Certificate already exists for $DOMAIN. Skipping issuance."
+else
+  systemctl stop nginx
+
+  certbot certonly --standalone \
+    --non-interactive \
+    --agree-tos \
+    --email "$EMAIL" \
+    -d "$DOMAIN"
+
+  echo "Certificate obtained successfully."
+fi
-systemctl stop nginx
-
-certbot certonly --standalone \
-  --non-interactive \
-  --agree-tos \
-  --email "$EMAIL" \
-  -d "$DOMAIN"
-
-echo "Certificate obtained successfully."
modules/app_stack/ec2.tf (1)

87-87: Verify cloud-init completion timeout.

The cloud-init status --wait command will wait indefinitely for cloud-init to complete. If cloud-init hangs or fails, the provisioner will block forever. Consider adding a timeout.

🔎 Add timeout to cloud-init wait
-      "cloud-init status --wait > /dev/null", # Docker 설치 대기
+      "timeout 300 cloud-init status --wait > /dev/null || (echo 'cloud-init timeout or failure' && exit 1)", # Docker 설치 대기 (5분 타임아웃)
modules/app_stack/scripts/side_infra_setup.sh.tftpl (4)

1-13: Consider adding set -u for safer error handling.

While set -e is present, adding set -u would catch undefined variable references and prevent silent failures when template variables aren't properly substituted.

🔎 Proposed enhancement
 #!/bin/bash
 
-set -e
+set -eu

30-30: Remove deprecated version field from docker-compose.yml.

The version field in Docker Compose files is deprecated and no longer necessary in recent versions of Docker Compose.

🔎 Proposed fix
 cat <<EOF > "$WORK_DIR/docker-compose.side-infra.yml"
-version: '3.8'
-
 services:

36-36: Verify the necessity of host networking mode.

All three services use network_mode: "host", which bypasses Docker's network isolation and directly exposes services on the host network. While this simplifies networking, it reduces security isolation and can lead to port conflicts.

If host networking is required for performance or specific architectural reasons, ensure this is documented. Otherwise, consider using Docker networks with explicit port mappings for better isolation.

Also applies to: 47-47, 58-58


72-75: Health check verification is optional; Docker Compose V2 is already ensured.

The script uses Docker Compose V2 syntax (docker compose with a space), which is correctly installed by docker_setup.sh as a prerequisite before this script runs (via docker-compose-plugin package). Verifying Docker Compose V2 availability is not necessary.

Adding health checks after docker compose up -d would be a nice-to-have enhancement for better visibility, but the current implementation with error handling on cleanup operations (|| true) is sufficient for the script's purpose.

environment/stage/variables.tf (1)

107-135: Add validation and improve variable documentation.

The new variables lack validation rules and some have vague descriptions:

  1. alloy_env_name (lines 117-120): The description "Alloy Env Name" is not descriptive enough. Clarify what values are expected (e.g., "production", "dev", "staging").

  2. Version variables (lines 122-135): Consider adding validation to ensure Docker image tags follow expected formats.

  3. ssh_key_path (lines 107-110): Could benefit from validation to ensure it's a valid file path.

🔎 Proposed enhancements
 variable "alloy_env_name" {
-  description = "Alloy Env Name"
+  description = "Environment name for Alloy (e.g., 'production', 'dev', 'staging')"
   type        = string
+  validation {
+    condition     = contains(["production", "dev", "staging"], var.alloy_env_name)
+    error_message = "alloy_env_name must be one of: production, dev, staging"
+  }
 }
 
 variable "redis_version" {
   description = "Docker image tag for Redis"
   type        = string
+  validation {
+    condition     = can(regex("^[a-zA-Z0-9._-]+$", var.redis_version))
+    error_message = "redis_version must be a valid Docker image tag"
+  }
 }
 
 variable "redis_exporter_version" {
   description = "Docker image tag for Redis Exporter"
   type        = string
+  validation {
+    condition     = can(regex("^[a-zA-Z0-9._-]+$", var.redis_exporter_version))
+    error_message = "redis_exporter_version must be a valid Docker image tag"
+  }
 }
 
 variable "alloy_version" {
   description = "Docker image tag for Grafana Alloy"
   type        = string
+  validation {
+    condition     = can(regex("^[a-zA-Z0-9._-]+$", var.alloy_version))
+    error_message = "alloy_version must be a valid Docker image tag"
+  }
 }
environment/prod/variables.tf (1)

107-135: Add validation and improve variable documentation (same as stage).

The new variables have the same issues as in environment/stage/variables.tf:

  1. alloy_env_name lacks clear documentation about expected values
  2. Version variables lack validation for Docker tag formats
  3. ssh_key_path could benefit from path validation

Additionally, consider whether these identical variable definitions across stage and prod environments could be DRYed up using a shared variables module or common configuration to reduce duplication and maintenance burden.

Refer to the proposed validation enhancements in environment/stage/variables.tf. The same improvements should be applied here for consistency.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea76ec and e7760fc.

📒 Files selected for processing (12)
  • README.md
  • config/secrets
  • config/side-infra/config.alloy.tftpl
  • environment/prod/main.tf
  • environment/prod/variables.tf
  • environment/stage/main.tf
  • environment/stage/variables.tf
  • modules/app_stack/ec2.tf
  • modules/app_stack/scripts/docker_setup.sh
  • modules/app_stack/scripts/nginx_setup.sh.tftpl
  • modules/app_stack/scripts/side_infra_setup.sh.tftpl
  • modules/app_stack/variables.tf
🔇 Additional comments (10)
config/secrets (1)

1-1: The review comment references incorrect commit hashes that do not exist in this repository's history. The old commit hash c1cf69a9de6f6b766750395875cd5bdcb16a0e96 is not present in any branch or tag. The config/secrets submodule was first added in commit df43044 (feat: terraform으로 prod/stage 환경 IaC 구현) and subsequently updated in commit e7760fc (refactor: private IP를 사용하여 접근하도록 수정). The current commit 83f176821a253bb18f5ca36f1f24e8ce2e7c91d7 is the result of these normal commits and is fully visible in the repository history. No hidden or obscured changes are present.

Likely an incorrect or invalid review comment.

modules/app_stack/scripts/nginx_setup.sh.tftpl (1)

52-54: LGTM! Strong TLS configuration.

The TLS configuration uses modern protocols (TLSv1.2+) and strong cipher suites, which aligns with current security best practices.

modules/app_stack/ec2.tf (3)

58-94: Consider null_resource re-execution behavior.

The null_resource uses SHA256 hash of the rendered template as a trigger. This means:

  1. Any change to the script will trigger re-execution (intended)
  2. The script runs via remote-exec every time the hash changes
  3. If the script is not idempotent, this could cause issues

Since the nginx setup script has non-idempotent operations (like cert issuance), repeated executions could fail or cause rate limiting from Let's Encrypt.

Consider whether the scripts are designed to be re-run safely. Based on the earlier review of nginx_setup.sh.tftpl, the certificate issuance step needs idempotency improvements.


97-143: Good approach to side infra provisioning.

Using null_resource with remote-exec for side infrastructure deployment is appropriate here. The trigger based on script hash ensures updates are applied when configuration changes.

The nested templatefile() for Alloy config correctly injects the monitoring server's private IP, ensuring proper integration with the logging backend.


104-106: Verify monitoring server allows inbound traffic on port 3100.

The Alloy configuration sends logs to Loki on port 3100 (http://${loki_ip}:3100/loki/api/v1/push). Both the API server and monitoring server are in the same VPC, so routing is available. However, the monitoring server's security group (not managed in this module) must allow inbound port 3100 traffic from the API server's security group (api_sg). Verify this rule exists in the monitoring server's security group configuration.

README.md (1)

6-42: LGTM! Clear documentation of new structure.

The updated directory structure clearly documents the new side-infra configuration directory and the scripts for deployment. The organization is logical and makes it easy to understand where files are located.

environment/prod/main.tf (1)

49-58: LGTM! Proper variable propagation.

The new variables are correctly passed through to the module. The comments clearly indicate the purpose of each section (SSH key path and Side Infra variables).

environment/stage/main.tf (1)

49-58: LGTM! Consistent with production environment.

The stage environment correctly mirrors the production environment's variable propagation, ensuring consistency across environments.

config/side-infra/config.alloy.tftpl (1)

25-25: No action required. The ALLOY_ENV environment variable is properly configured in the deployment setup. The side_infra_setup.sh.tftpl script correctly passes ALLOY_ENV=$ALLOY_ENV_NAME to the Alloy container via the docker-compose environment section (line 57), ensuring logs receive the environment label as intended.

modules/app_stack/scripts/side_infra_setup.sh.tftpl (1)

23-26: LGTM!

The heredoc approach for creating the Alloy configuration file is clean and appropriate.

Comment on lines +1 to +11
data "aws_instance" "monitoring_server" {
filter {
name = "tag:Name"
values = ["solid-connection-monitoring"]
}

filter {
name = "instance-state-name"
values = ["running"]
}
}
Copy link

@coderabbitai coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for missing monitoring server.

The data source will fail the Terraform plan if no monitoring server is found or if multiple servers match the filters. Consider adding validation or documentation about this prerequisite.

🔎 Add validation check

You can add a validation step after the module that uses this data source:

# In the calling module (environment/prod or stage)
locals {
  monitoring_server_id = module.prod_stack.monitoring_server_id
}

# Add output in modules/app_stack/outputs.tf
output "monitoring_server_id" {
  value = data.aws_instance.monitoring_server.id
  description = "ID of the monitoring server"
}

Alternatively, add a precondition check in Terraform 1.2+:

 data "aws_instance" "monitoring_server" {
+  lifecycle {
+    precondition {
+      condition     = length(data.aws_instance.monitoring_server.*.id) == 1
+      error_message = "Exactly one monitoring server must exist with tag Name=solid-connection-monitoring in running state"
+    }
+  }
+
   filter {
     name   = "tag:Name"
     values = ["solid-connection-monitoring"]
   }

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

어차피 aws_instance라는 단수형으로 선언했기에 2개 이상이거나 0개면 error가 나! 그러니 그 부분은 신경 안써도 돼!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexeong, you're absolutely right! I appreciate the clarification. The data "aws_instance" resource (singular) indeed enforces exactly-one semantics automatically, so Terraform will fail during the plan phase if zero or multiple instances match those filters. My suggested validation was redundant. Thank you for pointing that out!

(\(\
(  ·.·)
o_(")(")*
*thoughtful rabbit*

✏️ Learnings added
Learnt from: Hexeong
Repo: solid-connection/solid-connection-infra PR: 11
File: modules/app_stack/ec2.tf:1-11
Timestamp: 2025-12-29T14:32:47.078Z
Learning: In Terraform, `data "aws_instance"` (singular) automatically fails if zero or more than one instance matches the filters, so explicit validation for this case is not needed.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +16 to +21
# 1. Create working and log directories
mkdir -p "$WORK_DIR/config/side-infra"
mkdir -p "$WORK_DIR/logs"
# Set log directory permissions (Shared by App and Alloy)
chmod 777 "$WORK_DIR/logs"
chown -R ubuntu:ubuntu "$WORK_DIR"
Copy link

@coderabbitai coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reduce overly permissive directory permissions.

Setting the log directory to 777 grants read, write, and execute permissions to all users, which poses a security risk. If the Alloy container needs write access, consider using more restrictive permissions (e.g., 775) combined with proper user/group configuration in the container setup.

🔎 Proposed fix
 mkdir -p "$WORK_DIR/config/side-infra"
 mkdir -p "$WORK_DIR/logs"
 # Set log directory permissions (Shared by App and Alloy)
-chmod 777 "$WORK_DIR/logs"
+chmod 775 "$WORK_DIR/logs"
 chown -R ubuntu:ubuntu "$WORK_DIR"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 1. Create working and log directories
mkdir -p "$WORK_DIR/config/side-infra"
mkdir -p "$WORK_DIR/logs"
# Set log directory permissions (Shared by App and Alloy)
chmod 777 "$WORK_DIR/logs"
chown -R ubuntu:ubuntu "$WORK_DIR"
# 1. Create working and log directories
mkdir -p "$WORK_DIR/config/side-infra"
mkdir -p "$WORK_DIR/logs"
# Set log directory permissions (Shared by App and Alloy)
chmod 775 "$WORK_DIR/logs"
chown -R ubuntu:ubuntu "$WORK_DIR"
🤖 Prompt for AI Agents
In modules/app_stack/scripts/side_infra_setup.sh.tftpl around lines 16 to 21,
the script sets overly-permissive permissions on the logs directory (chmod 777);
change this to a more restrictive mode such as chmod 775 and ensure group
ownership is set to the group the Alloy container runs as (e.g., chown -R
ubuntu:<alloy-group> "$WORK_DIR" or set the correct group name), so the
container has write access without granting world-writable permissions; update
any container/user docs or deployment config to use that group if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

현재 배포 상황은 다음과 같아!

상황:

  • 호스트 머신(Ubuntu)이 폴더를 생성합니다. (소유자: ubuntu)

  • Spring Boot 애플리케이션(컨테이너)이 이 폴더에 로그를 써야 합니다.

  • Alloy(컨테이너)가 이 폴더를 읽어야 합니다.

이때, Springboot가 root 사용자로 실행되는 게 아니면 접근 권한 오류가 생길 수 있어!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexeong, 배포 상황을 설명해 주셔서 감사합니다! 컨테이너 환경에서 여러 프로세스가 같은 디렉토리에 접근해야 하는 상황이시군요.

이 시나리오에서도 chmod 777보다 더 안전한 방법들이 있습니다:

권장 방법

1. 컨테이너를 ubuntu 사용자로 실행 (가장 간단)

Docker Compose에서 Spring Boot와 Alloy 컨테이너를 ubuntu 사용자로 실행하도록 설정:

services:
  spring-app:
    user: "1000:1000"  # ubuntu 사용자의 UID:GID
    volumes:
      - ${WORK_DIR}/logs:/logs

  alloy:
    user: "1000:1000"
    volumes:
      - ${WORK_DIR}/logs:/logs:ro  # 읽기 전용

이렇게 하면 chmod 755로도 충분합니다.

2. 공유 그룹 사용

# docker 그룹을 로그 디렉토리에 할당
chgrp docker "$WORK_DIR/logs"
chmod 775 "$WORK_DIR/logs"

그리고 컨테이너들이 docker 그룹에 속하도록 설정합니다.

보안 문제

chmod 777을 사용하면:

  • 시스템의 모든 사용자가 로그를 수정/삭제할 수 있습니다
  • 악의적인 프로세스가 로그를 조작하거나 민감한 정보를 읽을 수 있습니다
  • 다른 컨테이너나 서비스가 의도치 않게 로그를 수정할 수 있습니다

현재 설정에서 Spring Boot 컨테이너가 어떤 사용자로 실행되는지 확인해 보시겠어요? 그러면 더 구체적인 해결 방법을 제안해 드릴 수 있습니다.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

일단 2번은 실패할 가능성이 높아 보여.

  • Spring Boot 앱이 컨테이너 내부에서 root가 아닌 일반 유저로 실행된다고 가정해 보자. (보안 권장 사항)
  • 이 내부 유저는 호스트의 docker 그룹에 속해 있을까? 아니.
  • 따라서 리눅스는 이 프로세스를 '제3자(Others)'로 취급해.

결과: Permission Denied (쓰기 권한 없음) 에러 발생.

그리고 1번은 적용되기까지의 server 작업이 또 필요하기 때문에 다른 pr에서 작업하는게 나아보여. 그리고 해당 pr은 server에서 infra REPO로의 code migrate이기에 최대한 변인을 통제하는게 맞다고 봐

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexeong, 말씀하신 내용 이해했습니다! 컨테이너 내부 사용자와 호스트 권한 매핑에 대한 설명이 정확하시고, 마이그레이션 PR에서 변경 범위를 최소화하려는 접근도 타당합니다.

현재 상황에서는 chmod 777이 필요하다는 점 공감합니다. 서버 레포지토리의 컨테이너 설정까지 함께 수정해야 하는 작업을 이 PR에 포함시키면 복잡도가 높아지고 리스크가 증가하겠네요.

제안

나중에 별도 PR에서 권한 설정을 개선할 계획이시라면, GitHub 이슈로 기록해 두시는 건 어떨까요? 예를 들어:

이슈 제목: "Improve log directory permissions from 777 to more restrictive settings"

내용:

  • 현재 상태: chmod 777로 설정 (마이그레이션 호환성 유지)
  • 목표: Spring Boot와 Alloy 컨테이너의 user 설정을 조정하여 chmod 775 또는 755로 변경
  • 작업 범위: infra 레포 + server 레포 양쪽 수정 필요

이렇게 하면 기술 부채를 명시적으로 추적할 수 있고, 나중에 작업할 때 컨텍스트를 바로 파악할 수 있습니다.

이슈를 생성해 드릴까요?

Comment on lines +33 to +48
redis:
image: redis:$REDIS_VERSION
container_name: redis
network_mode: "host"
restart: always
command: redis-server --bind 127.0.0.1 --protected-mode yes
redis-exporter:
image: oliver006/redis_exporter:$REDIS_EXPORTER_VERSION
container_name: redis-exporter
environment:
REDIS_ADDR: "127.0.0.1:6379"
depends_on:
- redis
network_mode: "host"
restart: always
Copy link

@coderabbitai coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add health checks and consider readiness dependencies.

The redis-exporter service uses depends_on which only ensures Redis starts before the exporter, but doesn't guarantee Redis is ready to accept connections. This could cause the exporter to fail or enter a restart loop during initialization.

🔎 Proposed enhancement with health checks
   redis:
     image: redis:$REDIS_VERSION
     container_name: redis
     network_mode: "host"
     restart: always
     command: redis-server --bind 127.0.0.1 --protected-mode yes
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 3s
+      retries: 3
 
   redis-exporter:
     image: oliver006/redis_exporter:$REDIS_EXPORTER_VERSION
     container_name: redis-exporter
     environment:
       REDIS_ADDR: "127.0.0.1:6379"
     depends_on:
-      - redis
+      redis:
+        condition: service_healthy
     network_mode: "host"
     restart: always
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
redis:
image: redis:$REDIS_VERSION
container_name: redis
network_mode: "host"
restart: always
command: redis-server --bind 127.0.0.1 --protected-mode yes
redis-exporter:
image: oliver006/redis_exporter:$REDIS_EXPORTER_VERSION
container_name: redis-exporter
environment:
REDIS_ADDR: "127.0.0.1:6379"
depends_on:
- redis
network_mode: "host"
restart: always
redis:
image: redis:$REDIS_VERSION
container_name: redis
network_mode: "host"
restart: always
command: redis-server --bind 127.0.0.1 --protected-mode yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 3
redis-exporter:
image: oliver006/redis_exporter:$REDIS_EXPORTER_VERSION
container_name: redis-exporter
environment:
REDIS_ADDR: "127.0.0.1:6379"
depends_on:
redis:
condition: service_healthy
network_mode: "host"
restart: always
🤖 Prompt for AI Agents
In modules/app_stack/scripts/side_infra_setup.sh.tftpl around lines 33-48, the
redis-exporter can start before Redis is ready because depends_on only orders
container start, not readiness; add a proper healthcheck to the redis service
(e.g., a redis-cli PING or tcp check with sensible interval/timeout/retries) and
change redis-exporter to wait for Redis healthy (use depends_on with condition:
service_healthy if your compose version supports it) or, if compose version does
not support health conditions, add a small wrapper/startup check for
redis-exporter that polls Redis until it responds before launching the exporter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이번 pr은 server코드에서 migrate이기에 최대한 변인통제를 위해서 일단 기존 설정대로 유지하는 게 좋을 것 같아. 나중에 다른 pr로 해당 문제에 대해 다뤄볼게

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexeong, 이해했습니다! 마이그레이션 시 변인을 최소화하는 것이 좋은 접근이네요.

나중에 해당 health check 개선사항을 다룰 때 참고하실 수 있도록 GitHub 이슈를 생성해드릴까요?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
modules/app_stack/security_groups.tf (1)

60-67: Security Group 참조 방식으로 변경하여 유지보수성 개선 권장

현재 구현은 모니터링 인스턴스의 private IP를 CIDR 블록으로 하드코딩합니다. 이 방식은 다음과 같은 제한사항이 있습니다:

  • 모니터링 인스턴스가 재생성되어 새로운 IP를 받으면 다음 terraform apply 전까지 규칙이 업데이트되지 않음
  • IP 기반 참조는 인스턴스 교체 시 수동 개입이 필요할 수 있음

모니터링 인스턴스에 전용 Security Group을 할당하고 해당 SG를 참조하는 방식을 고려해보세요. 이렇게 하면 인스턴스가 재생성되어도 SG 멤버십이 유지되어 더 안정적입니다.

🔎 Security Group 참조 방식으로 변경하는 제안

모니터링 인스턴스가 monitoring_sg라는 Security Group을 사용한다고 가정할 때:

  ingress {
-   description = "Allow 8081 from EC2: (${data.aws_instance.monitoring_ec2.tags.Name})"
+   description = "Allow 8081 from monitoring EC2"
    from_port   = 8081
    to_port     = 8081
    protocol    = "tcp"
-
-   cidr_blocks = ["${data.aws_instance.monitoring_ec2.private_ip}/32"]
+   security_groups = [data.aws_security_group.monitoring_sg.id]
  }

이 방법을 사용하려면 모니터링 SG를 참조하는 data source가 필요합니다:

data "aws_security_group" "monitoring_sg" {
  filter {
    name   = "tag:Name"
    values = ["solid-connection-monitoring-sg"]
  }
}
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7760fc and deb153f.

📒 Files selected for processing (1)
  • modules/app_stack/security_groups.tf
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-29T14:32:47.078Z
Learnt from: Hexeong
Repo: solid-connection/solid-connection-infra PR: 11
File: modules/app_stack/ec2.tf:1-11
Timestamp: 2025-12-29T14:32:47.078Z
Learning: In Terraform, the data source aws_instance (singular) will fail at plan/apply if zero or more than one match is found. Do not add extra validation to enforce a single match; rely on Terraform's built-in behavior. If a single specific instance is required, use a more precise data source filter or reference the instance by ID in the configuration.

Applied to files:

  • modules/app_stack/security_groups.tf

Comment on lines +1 to +11
data "aws_instance" "monitoring_ec2" {
filter {
name = "tag:Name"
values = ["solid-connection-monitoring"]
}

filter {
name = "instance-state-name"
values = ["running"]
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if monitoring instance is managed in the same workspace
echo "=== Searching for monitoring instance definition ==="
rg "solid-connection-monitoring" --type=tf -C3

echo -e "\n=== Checking for depends_on configurations ==="
rg "depends_on.*monitoring" --type=tf -C3

echo -e "\n=== Searching for aws_instance resource with monitoring tag ==="
rg "aws_instance.*monitoring_ec2|solid-connection-monitoring" --type=tf -B5 -A10

echo -e "\n=== Check all aws_instance resource definitions ==="
rg 'resource "aws_instance"' --type=tf -B2 -A15

Repository: solid-connection/solid-connection-infra

Length of output: 4339


외부 모니터링 인스턴스에 대한 하드 의존성

data "aws_instance" 데이터 소스는 실행 시점에 정확히 하나의 running 상태 인스턴스를 찾아야 합니다. 모니터링 인스턴스("solid-connection-monitoring")가 사전에 생성되지 않았거나 stopped 상태일 경우, terraform plan/apply가 실패합니다.

모니터링 인스턴스는 이 모듈 외부에서 관리되므로, 인프라 배포 전에 모니터링 인스턴스가 반드시 running 상태여야 한다는 사전 조건을 명확히 문서화해야 합니다. 또한 모니터링 인스턴스가 재생성되어 IP가 변경되면, 해당 보안 그룹 규칙은 다음 terraform apply까지 업데이트되지 않는다는 점도 고려하세요.

🤖 Prompt for AI Agents
In modules/app_stack/security_groups.tf around lines 1-11, the data
"aws_instance" lookup requires exactly one running instance and will fail if the
external monitoring instance doesn't exist or is stopped; change the
implementation to tolerate zero or multiple matches and avoid hard dependency by
using data "aws_instances" (or a tag-based data source) to collect zero-or-more
instances, then conditionally create or populate any security group rules only
when an IP/address is present (e.g., use for_each/count guarded by
length(data.aws_instances.*) > 0 or a conditional local that picks the current
private/public IP(s)); also add a short module-level note in README explaining
the external monitoring instance is managed outside this module and that IP
changes require reapply.

Copy link
Member

@whqtker whqtker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

고생하셨습니다 !

사소한 거라 여기에 남기는데, 도커 셋업 스크립트에도 set -e 명령 있으면 좋을 거 같습니다 ! 다른 스크립트에는 있더라고요

@Hexeong Hexeong force-pushed the feat/6-integrate-side-infra branch from deb153f to 18a401e Compare December 29, 2025 14:44
@whqtker whqtker self-requested a review December 29, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Side InfraStructure(Redis, Redis-Exporter, Alloy)를 Terraform으로 관리하기

3 participants