diff --git a/documentdb-playground/telemetry/README.md b/documentdb-playground/telemetry/README.md
new file mode 100644
index 00000000..f20df3a7
--- /dev/null
+++ b/documentdb-playground/telemetry/README.md
@@ -0,0 +1,311 @@
+# DocumentDB Multi-Tenant Telemetry Setup
+
+This directory contains scripts to set up complete multi-tenant telemetry infrastructure for DocumentDB on Azure Kubernetes Service (AKS) with namespace-based isolation and dedicated monitoring stacks per team.
+
+## Prerequisites
+
+- Azure CLI installed and configured
+- kubectl installed
+- Helm installed
+- jq installed (for JSON parsing)
+- An active Azure subscription
+- Existing AKS cluster with DocumentDB Operator installed
+
+## Scripts Overview
+
+### deploy-multi-tenant-telemetry.sh
+
+**Primary deployment script** that sets up complete multi-tenant infrastructure:
+- Creates isolated namespaces for teams (sales-namespace, accounts-namespace)
+- Deploys DocumentDB clusters per team with proper CNPG configuration
+- Sets up dedicated OpenTelemetry Collectors with CPU/memory monitoring
+- Installs separate Prometheus and Grafana instances per team
+- Configures proper RBAC and service accounts
+
+**Usage:**
+```bash
+# Deploy complete multi-tenant stack
+./deploy-multi-tenant-telemetry.sh
+
+# Deploy only DocumentDB clusters
+./deploy-multi-tenant-telemetry.sh --documentdb-only
+
+# Deploy only telemetry stack
+./deploy-multi-tenant-telemetry.sh --telemetry-only
+
+# Skip waiting for deployments (for status checking)
+./deploy-multi-tenant-telemetry.sh --skip-wait
+```
+
+### setup-grafana-dashboards.sh
+
+**Automated dashboard creation** that programmatically sets up monitoring dashboards:
+- Creates comprehensive CPU and Memory monitoring dashboards
+- Configures namespace-specific metric filtering
+- Includes pod count and resource utilization metrics
+- Uses Grafana API for automated deployment
+
+**Usage:**
+```bash
+# Create dashboard for sales team
+./setup-grafana-dashboards.sh sales-namespace
+
+# Create dashboard for accounts team
+./setup-grafana-dashboards.sh accounts-namespace
+```
+
+### delete-multi-tenant-telemetry.sh
+
+**Application cleanup script** that removes multi-tenant applications while preserving infrastructure:
+- Deletes DocumentDB clusters per team
+- Removes OpenTelemetry collectors 
+- Cleans up Prometheus and Grafana monitoring stacks
+- Deletes team namespaces and associated resources
+
+**Usage:**
+```bash
+# Delete everything (applications only, keeps infrastructure)
+./delete-multi-tenant-telemetry.sh --delete-all
+
+# Delete only DocumentDB clusters
+./delete-multi-tenant-telemetry.sh --delete-documentdb
+
+# Delete only monitoring (Prometheus/Grafana)
+./delete-multi-tenant-telemetry.sh --delete-monitoring
+
+# Delete with no confirmation prompts
+./delete-multi-tenant-telemetry.sh --delete-all --force
+```
+
+### Infrastructure Management Scripts
+
+#### create-cluster.sh
+**Infrastructure setup** - Creates AKS cluster and operators only:
+```bash
+# Create cluster + DocumentDB operator + OpenTelemetry operator
+./create-cluster.sh --install-all
+
+# Create cluster only
+./create-cluster.sh
+
+# Install operators on existing cluster
+./create-cluster.sh --install-operator
+```
+
+#### delete-cluster.sh
+**Infrastructure cleanup** - Removes cluster and all Azure resources:
+```bash
+# Delete entire AKS cluster and Azure resources
+./delete-cluster.sh --delete-all
+
+# Delete only cluster (keeps resource group)
+./delete-cluster.sh --delete-cluster
+```
+
+## Script Organization
+
+### Infrastructure vs Applications
+
+Our scripts are organized with **clean separation of concerns**:
+
+| **Infrastructure Scripts** | **Application Scripts** |
+|---------------------------|-------------------------|
+| `create-cluster.sh` | `deploy-multi-tenant-telemetry.sh` |
+| `delete-cluster.sh` | `delete-multi-tenant-telemetry.sh` |
+| | `setup-grafana-dashboards.sh` |
+
+**Infrastructure Scripts** manage:
+- ✅ AKS cluster creation/deletion
+- ✅ Azure resource management
+- ✅ DocumentDB operator installation
+- ✅ OpenTelemetry operator installation
+- ✅ Core platform components (cert-manager, CSI drivers)
+
+**Application Scripts** manage:
+- 📦 DocumentDB cluster deployments per team
+- 🔧 OpenTelemetry collector configurations
+- 📊 Monitoring stacks (Prometheus, Grafana)
+- 🏠 Team namespaces and application resources
+
+### Benefits of This Approach
+
+- **🔄 Reusable Infrastructure**: Create cluster once, deploy multiple application stacks
+- **💰 Cost Optimization**: Delete applications without losing cluster setup
+- **🔧 Independent Updates**: Update monitoring without touching infrastructure
+- **👥 Team Isolation**: Each team can manage their own application stack
+- **🚀 Faster Iterations**: Deploy/destroy applications in seconds, not minutes
+
+## Architecture Overview
+
+### Multi-Tenant DocumentDB + Telemetry Stack
+
+Our implementation provides **complete namespace isolation** with dedicated resources per team:
+
+```
+┌─── sales-namespace ────────────────────────────┐  ┌─── accounts-namespace ──────────────────────┐
+│  • DocumentDB Cluster (documentdb-sales)       │  │  • DocumentDB Cluster (documentdb-accounts) │
+│  • OpenTelemetry Collector (sales-focused)     │  │  • OpenTelemetry Collector (accounts-focused)│
+│  • Prometheus Server (prometheus-sales)        │  │  • Prometheus Server (prometheus-accounts)   │
+│  • Grafana Instance (grafana-sales)            │  │  • Grafana Instance (grafana-accounts)       │
+│  • Dedicated RBAC & Service Accounts           │  │  • Dedicated RBAC & Service Accounts         │
+└─────────────────────────────────────────────────┘  └──────────────────────────────────────────────┘
+```
+
+### What Gets Deployed
+
+#### Per Team/Namespace:
+- **DocumentDB Cluster**: CNPG-managed PostgreSQL cluster with proper operator integration
+- **OpenTelemetry Collector**: Namespace-scoped metric collection focusing on CPU/Memory
+- **Prometheus Server**: Time-series database for storing team-specific metrics  
+- **Grafana Instance**: Visualization dashboard with automated dashboard provisioning
+- **RBAC Configuration**: Service accounts, cluster roles, and bindings for secure access
+
+#### Shared Components:
+- **DocumentDB Operator**: Cluster-wide operator managing all DocumentDB instances
+- **OpenTelemetry Operator**: Cluster-wide operator managing collector deployments
+
+## Recommended Workflow
+
+### 1. Infrastructure Setup (One Time)
+```bash
+# Create AKS cluster with all required operators
+cd scripts/
+./create-cluster.sh --install-all
+```
+
+### 2. Application Deployment (Repeatable)
+```bash
+# Deploy multi-tenant DocumentDB + monitoring
+./deploy-multi-tenant-telemetry.sh
+
+# Create automated dashboards
+./setup-grafana-dashboards.sh sales-namespace
+./setup-grafana-dashboards.sh accounts-namespace
+```
+
+### 3. Access & Monitor
+```bash
+# Access Grafana dashboards
+kubectl port-forward -n sales-namespace svc/grafana-sales 3001:3000 &
+kubectl port-forward -n accounts-namespace svc/grafana-accounts 3002:3000 &
+
+# Open in browser: http://localhost:3001 and http://localhost:3002
+# Login: admin / admin123
+```
+
+### 4. Cleanup Applications (Keep Infrastructure)
+```bash
+# Remove all applications, keep cluster running
+./delete-multi-tenant-telemetry.sh --delete-all
+```
+
+### 5. Full Cleanup (When Done)
+```bash
+# Delete entire Azure infrastructure
+./delete-cluster.sh --delete-all
+```
+
+## Quick Start Guide
+
+### 1. Deploy Complete Multi-Tenant Stack
+```bash
+# Deploy DocumentDB clusters + telemetry for both teams
+cd scripts/
+./deploy-multi-tenant-telemetry.sh
+```
+
+### 2. Create Monitoring Dashboards
+```bash
+# Create automated dashboards for both teams
+./setup-grafana-dashboards.sh sales-namespace
+./setup-grafana-dashboards.sh accounts-namespace
+```
+
+### 3. Access Grafana Dashboards
+```bash
+# Port-forward to sales Grafana (runs in background)
+kubectl port-forward -n sales-namespace svc/grafana-sales 3001:3000 > /dev/null 2>&1 &
+
+# Port-forward to accounts Grafana (runs in background)
+kubectl port-forward -n accounts-namespace svc/grafana-accounts 3002:3000 > /dev/null 2>&1 &
+
+# Access dashboards in browser:
+# Sales Team: http://localhost:3001 
+# Accounts Team: http://localhost:3002
+# Login: admin / admin123
+```
+
+## Monitoring Capabilities
+
+### Metrics Collected (CPU & Memory Focus)
+- **container_cpu_usage_seconds_total**: CPU usage per container
+- **container_memory_working_set_bytes**: Memory usage per container  
+- **container_spec_memory_limit_bytes**: Memory limits per container
+- **Pod count and status metrics**
+
+### Dashboard Features
+- **CPU Usage by Container**: Real-time CPU utilization with 5-minute rate calculation
+- **Memory Usage by Container**: Memory consumption in MB per container
+- **Memory Usage Percentage**: Memory usage as percentage of configured limits
+- **Pod Count Monitoring**: Number of active pods per namespace
+
+### Namespace Isolation
+Each OpenTelemetry collector is configured with strict namespace filtering:
+```yaml
+metric_relabel_configs:
+  - source_labels: [namespace]
+    regex: '^(sales-namespace)$'  # Only sales-namespace metrics
+    action: keep
+```
+
+## Advanced Usage
+
+### Deployment Options
+```bash
+# Deploy only DocumentDB clusters (skip telemetry)
+./deploy-multi-tenant-telemetry.sh --documentdb-only
+
+# Deploy only telemetry stack (skip DocumentDB)  
+./deploy-multi-tenant-telemetry.sh --telemetry-only
+
+# Check deployment status without waiting
+./deploy-multi-tenant-telemetry.sh --skip-wait
+```
+
+### Accessing Different Components
+```bash
+# Check DocumentDB cluster status
+kubectl get clusters -n sales-namespace
+kubectl get clusters -n accounts-namespace
+
+# View OpenTelemetry collector logs
+kubectl logs -n sales-namespace -l app.kubernetes.io/name=opentelemetry-collector
+
+# Access Prometheus directly
+kubectl port-forward -n sales-namespace svc/prometheus-sales-server 9090:80
+```
+
+### Troubleshooting
+```bash
+# Check all pods status
+kubectl get pods -n sales-namespace
+kubectl get pods -n accounts-namespace
+
+# View collector configuration
+kubectl get otelcol -n sales-namespace otel-collector-sales -o yaml
+
+# Check metric collection
+kubectl logs -n sales-namespace deployment/otel-collector-sales
+```
+
+## Cost Management
+
+**Important**: This setup creates dedicated resources per team. Monitor costs and clean up when testing is complete:
+
+```bash
+# Clean up multi-tenant resources
+kubectl delete namespace sales-namespace accounts-namespace
+
+# Or use legacy cleanup (if applicable)
+./delete-cluster.sh
+```
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/otel-collector-accounts.yaml b/documentdb-playground/telemetry/otel-collector-accounts.yaml
new file mode 100644
index 00000000..786eeb41
--- /dev/null
+++ b/documentdb-playground/telemetry/otel-collector-accounts.yaml
@@ -0,0 +1,118 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: otel-collector
+  namespace: accounts-namespace
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: otel-collector-accounts
+rules:
+- apiGroups: [""]
+  resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
+  verbs: ["get", "list", "watch"]
+- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
+  verbs: ["get"]
+- apiGroups: ["apps"]
+  resources: ["daemonsets", "deployments", "replicasets"]
+  verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: otel-collector-accounts
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: otel-collector-accounts
+subjects:
+- kind: ServiceAccount
+  name: otel-collector
+  namespace: accounts-namespace
+---
+apiVersion: opentelemetry.io/v1beta1
+kind: OpenTelemetryCollector
+metadata:
+  name: documentdb-accounts-collector
+  namespace: accounts-namespace
+spec:
+  mode: deployment  # Single pod per namespace, not DaemonSet
+  replicas: 1
+  serviceAccount: otel-collector
+  config:
+    receivers:
+      # Scrape container CPU/Memory metrics from DocumentDB pods
+      prometheus:
+        config:
+          scrape_configs:
+            # Container CPU/Memory metrics via Kubernetes API proxy to cAdvisor
+            - job_name: 'accounts-container-metrics'
+              kubernetes_sd_configs:
+                - role: node
+              relabel_configs:
+                # Use Kubernetes API proxy to access cAdvisor
+                - target_label: __address__
+                  replacement: kubernetes.default.svc:443
+                - source_labels: [__meta_kubernetes_node_name]
+                  regex: (.+)
+                  target_label: __metrics_path__
+                  replacement: '/api/v1/nodes/$1/proxy/metrics/cadvisor'
+                - source_labels: [__meta_kubernetes_node_name]
+                  target_label: instance
+                - target_label: tenant
+                  replacement: 'accounts'
+              scheme: https
+              tls_config:
+                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+                insecure_skip_verify: true
+              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+              metric_relabel_configs:
+                # Filter only accounts namespace containers after scraping
+                - source_labels: [namespace]
+                  regex: 'accounts-namespace'
+                  action: keep
+                # Keep only running containers (exclude POD sandbox)
+                - source_labels: [container]
+                  regex: '^$|POD'
+                  action: drop
+
+    processors:
+      batch:
+        timeout: 10s
+        send_batch_size: 1024
+
+      attributes:
+        actions:
+          - key: service.name
+            value: "documentdb-accounts-telemetry"
+            action: insert
+          - key: telemetry.source
+            value: "otel-collector-accounts"
+            action: insert
+          - key: tenant
+            value: "accounts"
+            action: insert
+
+    exporters:
+      # Export to accounts team's dedicated Prometheus
+      prometheusremotewrite:
+        endpoint: "http://prometheus-accounts-server.accounts-namespace.svc.cluster.local:80/api/v1/write"
+        external_labels:
+          tenant: "accounts"
+          cluster: "documentdb-accounts"
+
+      # Alternative: Export to tenant-specific external backend
+      # azuremonitor:
+      #   instrumentation_key: "${ACCOUNTS_AZURE_MONITOR_KEY}"
+
+    service:
+      pipelines:
+        metrics:
+          receivers: [prometheus]
+          processors: [attributes, batch]
+          exporters: [prometheusremotewrite]
+
+      telemetry:
+        logs:
+          level: "info"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/otel-collector-sales.yaml b/documentdb-playground/telemetry/otel-collector-sales.yaml
new file mode 100644
index 00000000..96d1c4ea
--- /dev/null
+++ b/documentdb-playground/telemetry/otel-collector-sales.yaml
@@ -0,0 +1,118 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: otel-collector
+  namespace: sales-namespace
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: otel-collector-sales
+rules:
+- apiGroups: [""]
+  resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
+  verbs: ["get", "list", "watch"]
+- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
+  verbs: ["get"]
+- apiGroups: ["apps"]
+  resources: ["daemonsets", "deployments", "replicasets"]
+  verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: otel-collector-sales
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: otel-collector-sales
+subjects:
+- kind: ServiceAccount
+  name: otel-collector
+  namespace: sales-namespace
+---
+apiVersion: opentelemetry.io/v1beta1
+kind: OpenTelemetryCollector
+metadata:
+  name: documentdb-sales-collector
+  namespace: sales-namespace
+spec:
+  mode: deployment  # Single pod per namespace, not DaemonSet
+  replicas: 1
+  serviceAccount: otel-collector
+  config:
+    receivers:
+      # Scrape container CPU/Memory metrics from DocumentDB pods
+      prometheus:
+        config:
+          scrape_configs:
+            # Container CPU/Memory metrics via Kubernetes API proxy to cAdvisor
+            - job_name: 'sales-container-metrics'
+              kubernetes_sd_configs:
+                - role: node
+              relabel_configs:
+                # Use Kubernetes API proxy to access cAdvisor
+                - target_label: __address__
+                  replacement: kubernetes.default.svc:443
+                - source_labels: [__meta_kubernetes_node_name]
+                  regex: (.+)
+                  target_label: __metrics_path__
+                  replacement: '/api/v1/nodes/$1/proxy/metrics/cadvisor'
+                - source_labels: [__meta_kubernetes_node_name]
+                  target_label: instance
+                - target_label: tenant
+                  replacement: 'sales'
+              scheme: https
+              tls_config:
+                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+                insecure_skip_verify: true
+              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+              metric_relabel_configs:
+                # Filter only sales namespace containers after scraping
+                - source_labels: [namespace]
+                  regex: 'sales-namespace'
+                  action: keep
+                # Keep only running containers (exclude POD sandbox)
+                - source_labels: [container]
+                  regex: '^$|POD'
+                  action: drop
+
+    processors:
+      batch:
+        timeout: 10s
+        send_batch_size: 1024
+
+      attributes:
+        actions:
+          - key: service.name
+            value: "documentdb-sales-telemetry"
+            action: insert
+          - key: telemetry.source
+            value: "otel-collector-sales"
+            action: insert
+          - key: tenant
+            value: "sales"
+            action: insert
+
+    exporters:
+      # Export to sales team's dedicated Prometheus
+      prometheusremotewrite:
+        endpoint: "http://prometheus-sales-server.sales-namespace.svc.cluster.local:80/api/v1/write"
+        external_labels:
+          tenant: "sales"
+          cluster: "documentdb-sales"
+
+      # Alternative: Export to tenant-specific external backend
+      # azuremonitor:
+      #   instrumentation_key: "${SALES_AZURE_MONITOR_KEY}"
+
+    service:
+      pipelines:
+        metrics:
+          receivers: [prometheus]
+          processors: [attributes, batch]
+          exporters: [prometheusremotewrite]
+
+      telemetry:
+        logs:
+          level: "info"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/scripts/create-cluster.sh b/documentdb-playground/telemetry/scripts/create-cluster.sh
new file mode 100755
index 00000000..916ff132
--- /dev/null
+++ b/documentdb-playground/telemetry/scripts/create-cluster.sh
@@ -0,0 +1,731 @@
+#!/bin/bash
+
+# DocumentDB AKS Cluster Creation Script
+# This script creates a complete AKS cluster with all dependencies for DocumentDB
+
+set -e  # Exit on any error
+
+# Configuration
+CLUSTER_NAME="ray-ddb-cluster"
+RESOURCE_GROUP="ray-documentdb-rg"
+LOCATION="West US 2"
+NODE_COUNT=2
+NODE_SIZE="Standard_D4s_v5"
+KUBERNETES_VERSION="1.31.11"
+
+# DocumentDB Operator Configuration
+# For testing: use hossain-rayhan/documentdb-operator (fork with Azure enhancements)
+# For production: use microsoft/documentdb-operator (official)
+OPERATOR_GITHUB_ORG="hossain-rayhan"
+OPERATOR_CHART_VERSION="0.1.112"
+
+# Feature flags - set to "true" to enable, "false" to skip
+INSTALL_OPERATOR="${INSTALL_OPERATOR:-false}"
+DEPLOY_INSTANCE="${DEPLOY_INSTANCE:-false}"
+CREATE_STORAGE_CLASS="${CREATE_STORAGE_CLASS:-false}"
+
+
+# GitHub credentials - check environment variables first, can be overridden by command line
+GITHUB_USERNAME="${GITHUB_USERNAME:-}"
+GITHUB_TOKEN="${GITHUB_TOKEN:-}"
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --skip-operator)
+            INSTALL_OPERATOR="false"
+            shift
+            ;;
+                --skip-instance)
+            DEPLOY_INSTANCE="false"
+            shift
+            ;;
+        --install-operator)
+            INSTALL_OPERATOR="true"
+            shift
+            ;;
+        --deploy-instance)
+            DEPLOY_INSTANCE="true"
+            shift
+            ;;
+        --install-all)
+            INSTALL_OPERATOR="true"
+            DEPLOY_INSTANCE="true"
+            shift
+            ;;
+
+        --create-storage-class)
+            CREATE_STORAGE_CLASS="true"
+            shift
+            ;;
+        --skip-storage-class)
+            CREATE_STORAGE_CLASS="false"
+            shift
+            ;;
+        --cluster-name)
+            CLUSTER_NAME="$2"
+            shift 2
+            ;;
+        --resource-group)
+            RESOURCE_GROUP="$2"
+            shift 2
+            ;;
+        --location)
+            LOCATION="$2"
+            shift 2
+            ;;
+        --github-username)
+            GITHUB_USERNAME="$2"
+            shift 2
+            ;;
+        --github-token)
+            GITHUB_TOKEN="$2"
+            shift 2
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  --skip-operator         Skip DocumentDB operator installation (default)"
+            echo "  --skip-instance         Skip DocumentDB instance deployment (default)"
+            echo "  --install-operator      Install DocumentDB operator only (assumes cluster exists)"
+            echo "  --deploy-instance       Deploy DocumentDB instance only (assumes cluster+operator exist)"
+            echo "  --install-all           Create cluster + install operator + deploy instance"
+
+            echo "  --create-storage-class  Create custom Premium SSD storage class"
+            echo "  --skip-storage-class    Use AKS default storage (StandardSSD_LRS) - default"
+            echo "  --cluster-name NAME     AKS cluster name (default: documentdb-cluster)"
+            echo "  --resource-group RG     Azure resource group (default: documentdb-rg)"
+            echo "  --location LOCATION     Azure location (default: East US)"
+            echo "  --github-username       GitHub username for operator installation"
+            echo "  --github-token          GitHub token for operator installation"
+            echo "  -h, --help             Show this help message"
+            echo ""
+            echo "Examples:"
+            echo "  $0                                    # Create cluster only"
+            echo "  $0 --install-operator                 # Install operator only (assumes cluster exists)"
+            echo "  $0 --deploy-instance                  # Deploy DocumentDB only (assumes cluster+operator exist)"
+
+            echo "  $0 --install-all --github-username myuser --github-token ghp_xxx  # Full setup with GitHub auth"
+            echo "  $0 --install-all                      # Create cluster + install operator + deploy instance"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Logging function
+log() {
+    echo -e "${BLUE}[$(date +'%Y-%m-%d %H:%M:%S')]${NC} $1"
+}
+
+success() {
+    echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] ✅ $1${NC}"
+}
+
+warn() {
+    echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] ⚠️  $1${NC}"
+}
+
+error() {
+    echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ❌ $1${NC}"
+    exit 1
+}
+
+# Check prerequisites
+check_prerequisites() {
+    log "Checking prerequisites..."
+    
+    # Check Azure CLI
+    if ! command -v az &> /dev/null; then
+        error "Azure CLI not found. Please install Azure CLI first."
+    fi
+    
+    # Check kubectl
+    if ! command -v kubectl &> /dev/null; then
+        error "kubectl not found. Please install kubectl first."
+    fi
+    
+    # Check Helm
+    if ! command -v helm &> /dev/null; then
+        error "Helm not found. Please install Helm first."
+    fi
+    
+    # Check Azure login
+    if ! az account show &> /dev/null; then
+        error "Not logged into Azure. Please run 'az login' first."
+    fi
+    
+    success "All prerequisites met"
+}
+
+# Create resource group
+create_resource_group() {
+    log "Creating resource group: $RESOURCE_GROUP in location: $LOCATION"
+    
+    # Check if resource group already exists
+    if az group show --name $RESOURCE_GROUP &> /dev/null; then
+        warn "Resource group $RESOURCE_GROUP already exists. Skipping creation."
+        return 0
+    fi
+    
+    # Create resource group
+    az group create --name $RESOURCE_GROUP --location "$LOCATION"
+    
+    if [ $? -eq 0 ]; then
+        success "Resource group created successfully"
+    else
+        error "Failed to create resource group"
+    fi
+}
+
+# Create AKS cluster
+create_cluster() {
+    log "Creating AKS cluster: $CLUSTER_NAME"
+    
+    # Check if cluster already exists
+    if az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME &> /dev/null; then
+        warn "Cluster $CLUSTER_NAME already exists. Skipping cluster creation."
+    else
+        # Create AKS cluster with managed identity and required addons
+        az aks create \
+            --resource-group $RESOURCE_GROUP \
+            --name $CLUSTER_NAME \
+            --node-count $NODE_COUNT \
+            --node-vm-size $NODE_SIZE \
+            --kubernetes-version $KUBERNETES_VERSION \
+            --enable-managed-identity \
+            --enable-addons monitoring \
+            --enable-cluster-autoscaler \
+            --min-count 2 \
+            --max-count 5 \
+            --generate-ssh-keys \
+            --network-plugin azure \
+            --network-policy azure \
+            --load-balancer-sku standard
+        
+        if [ $? -eq 0 ]; then
+            success "AKS cluster created successfully"
+        else
+            error "Failed to create AKS cluster"
+        fi
+    fi
+    
+    # Get cluster credentials
+    log "Getting cluster credentials..."
+    az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --overwrite-existing
+    
+    # Handle WSL case - copy Windows kubeconfig to WSL
+    if grep -qi microsoft /proc/version 2>/dev/null; then
+        log "Detected WSL environment, copying kubeconfig from Windows to WSL..."
+        WIN_KUBE_CONFIG="/mnt/c/Users/$(whoami)/.kube/config"
+        if [ -f "$WIN_KUBE_CONFIG" ]; then
+            mkdir -p ~/.kube
+            cp "$WIN_KUBE_CONFIG" ~/.kube/config
+            chmod 600 ~/.kube/config
+            log "Kubeconfig copied to WSL"
+        else
+            warn "Windows kubeconfig not found at expected location"
+        fi
+    fi
+    
+    success "Cluster credentials configured"
+}
+
+# Install Azure CSI drivers
+install_azure_csi_drivers() {
+    log "Checking Azure CSI drivers..."
+    
+    # Check if CSI drivers are already enabled (modern AKS clusters have them by default)
+    CSI_STATUS=$(az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --query "storageProfile" -o json 2>/dev/null)
+    DISK_CSI_ENABLED=$(echo "$CSI_STATUS" | jq -r '.diskCsiDriver.enabled // false')
+    FILE_CSI_ENABLED=$(echo "$CSI_STATUS" | jq -r '.fileCsiDriver.enabled // false')
+    
+    if [ "$DISK_CSI_ENABLED" == "true" ] && [ "$FILE_CSI_ENABLED" == "true" ]; then
+        success "Azure CSI drivers already enabled (Disk: ✅, File: ✅)"
+        return 0
+    fi
+    
+    log "CSI drivers not fully enabled - installing..."
+    log "Current status: Disk=$DISK_CSI_ENABLED, File=$FILE_CSI_ENABLED"
+    
+    # Azure Disk CSI driver (only if not enabled)
+    if [ "$DISK_CSI_ENABLED" != "true" ]; then
+        log "Enabling Azure Disk CSI driver..."
+        az aks update --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --enable-disk-driver >/dev/null 2>&1
+    fi
+    
+    # Azure File CSI driver (only if not enabled)
+    if [ "$FILE_CSI_ENABLED" != "true" ]; then
+        log "Enabling Azure File CSI driver..."
+        az aks update --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --enable-file-driver >/dev/null 2>&1
+    fi
+    
+    success "Azure CSI drivers configured"
+}
+
+# Verify Azure Load Balancer (built-in to AKS)
+configure_load_balancer() {
+    log "Verifying Azure Load Balancer..."
+    
+    # Azure Load Balancer is built into AKS, just verify it's working
+    if kubectl get service kubernetes -n default >/dev/null 2>&1; then
+        success "Azure Load Balancer verified (built-in to AKS)"
+    else
+        warn "Unable to verify Kubernetes API service"
+    fi
+}
+
+# Install cert-manager
+install_cert_manager() {
+    log "Installing cert-manager..."
+    
+    # Check if already installed
+    if helm list -n cert-manager | grep -q cert-manager; then
+        warn "cert-manager already installed. Skipping installation."
+        return 0
+    fi
+    
+    # Add Jetstack Helm repository
+    helm repo add jetstack https://charts.jetstack.io
+    helm repo update
+    
+    # Install cert-manager
+    helm install cert-manager jetstack/cert-manager \
+        --namespace cert-manager \
+        --create-namespace \
+        --version v1.13.2 \
+        --set installCRDs=true \
+        --set prometheus.enabled=false \
+        --set webhook.timeoutSeconds=30
+    
+    # Wait for cert-manager to be ready
+    log "Waiting for cert-manager to be ready..."
+    sleep 30
+    kubectl wait --for=condition=ready pod -l app.kubernetes.io/instance=cert-manager -n cert-manager --timeout=300s || warn "cert-manager pods may still be starting"
+    
+    success "cert-manager installed"
+}
+
+# Create optimized storage class for Azure (optional)
+create_storage_class() {
+    if [ "$CREATE_STORAGE_CLASS" != "true" ]; then
+        warn "Skipping custom storage class creation (using AKS default StandardSSD_LRS)"
+        return 0
+    fi
+    
+    log "Creating DocumentDB custom Premium SSD storage class..."
+    
+    # Check if storage class already exists
+    if kubectl get storageclass documentdb-storage &> /dev/null; then
+        warn "DocumentDB storage class already exists. Skipping creation."
+        return 0
+    fi
+    
+    kubectl apply -f - <<EOF
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: documentdb-storage
+  annotations:
+    storageclass.kubernetes.io/is-default-class: "false"
+provisioner: disk.csi.azure.com
+parameters:
+  skuName: Premium_LRS
+  kind: Managed
+  diskEncryptionSetID: ""
+  writeAcceleratorEnabled: "false"
+  networkAccessPolicy: AllowAll
+allowVolumeExpansion: true
+volumeBindingMode: WaitForFirstConsumer
+reclaimPolicy: Retain
+EOF
+    
+    success "DocumentDB Premium SSD storage class created"
+}
+
+# Install DocumentDB operator (optional)
+install_documentdb_operator() {
+    if [ "$INSTALL_OPERATOR" != "true" ]; then
+        warn "Skipping DocumentDB operator installation (--skip-operator specified)"
+        return 0
+    fi
+    
+    log "Installing DocumentDB operator from GitHub registry..."
+    
+    # Check if operator is already installed
+    if helm list -n documentdb-operator | grep -q documentdb-operator; then
+        warn "DocumentDB operator already installed. Skipping installation."
+        return 0
+    fi
+    
+    # Test internet connectivity to GitHub registry
+    log "Testing connectivity to GitHub Container Registry..."
+    if ! curl -s --connect-timeout 10 https://ghcr.io > /dev/null; then
+        error "Cannot reach ghcr.io. Please check your internet connection and firewall settings."
+    fi
+    
+    # Install DocumentDB operator using enhanced fork with Azure support
+    log "Installing DocumentDB operator from GitHub Container Registry (enhanced fork with Azure support)..."
+    
+    # Check for GitHub authentication
+    if [ -z "$GITHUB_TOKEN" ] || [ -z "$GITHUB_USERNAME" ]; then
+        error "DocumentDB operator installation requires GitHub authentication.
+
+GitHub credentials can be provided via:
+1. Environment variables (recommended):
+   export GITHUB_USERNAME='your-github-username'
+   export GITHUB_TOKEN='your-github-token'
+
+2. Command line arguments:
+   --github-username <username> --github-token <token>
+
+To create a GitHub token:
+1. Go to https://github.com/settings/tokens
+2. Generate a new token with 'read:packages' scope
+3. Set the environment variables as shown above
+
+Then run the script again with --install-operator"
+    fi
+    
+    # Authenticate with GitHub Container Registry
+    log "Authenticating with GitHub Container Registry..."
+    if ! echo "$GITHUB_TOKEN" | helm registry login ghcr.io --username "$GITHUB_USERNAME" --password-stdin; then
+        error "Failed to authenticate with GitHub Container Registry. Please verify your GITHUB_TOKEN and GITHUB_USERNAME."
+    fi
+    
+    # Install DocumentDB operator from OCI registry
+    log "Pulling and installing DocumentDB operator from ghcr.io/${OPERATOR_GITHUB_ORG}/documentdb-operator..."
+    helm install documentdb-operator \
+        oci://ghcr.io/${OPERATOR_GITHUB_ORG}/documentdb-operator \
+        --version ${OPERATOR_CHART_VERSION} \
+        --namespace documentdb-operator \
+        --create-namespace \
+        --wait \
+        --timeout 10m
+
+    if [ $? -eq 0 ]; then
+        success "DocumentDB operator installed successfully from ${OPERATOR_GITHUB_ORG}/documentdb-operator:${OPERATOR_CHART_VERSION}"
+    else
+        error "Failed to install DocumentDB operator from OCI registry. Please verify:
+- Your GitHub token has 'read:packages' scope
+- You have access to ${OPERATOR_GITHUB_ORG}/documentdb-operator repository  
+- The chart version ${OPERATOR_CHART_VERSION} exists"
+    fi
+    
+    # Wait for operator to be ready
+    log "Waiting for DocumentDB operator to be ready..."
+    kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=documentdb-operator -n documentdb-operator --timeout=300s || warn "DocumentDB operator pods may still be starting"
+    
+    success "DocumentDB operator installed"
+}
+
+# Deploy DocumentDB instance (optional)
+deploy_documentdb_instance() {
+    if [ "$DEPLOY_INSTANCE" != "true" ]; then
+        warn "Skipping DocumentDB instance deployment (--skip-instance specified or not enabled)"
+        return 0
+    fi
+    
+    log "Deploying DocumentDB instance..."
+    
+    # Check if operator is installed
+    if ! kubectl get deployment -n documentdb-operator documentdb-operator &> /dev/null; then
+        error "DocumentDB operator not found. Cannot deploy instance without operator."
+    fi
+    
+    # Create DocumentDB namespace
+    kubectl apply -f - <<EOF
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: documentdb-instance-ns
+EOF
+    
+    # Create credentials secret
+    kubectl apply -f - <<EOF
+apiVersion: v1
+kind: Secret
+metadata:
+  name: documentdb-credentials
+  namespace: documentdb-instance-ns
+type: Opaque
+stringData:
+  username: docdbadmin
+  password: SecurePassword123!
+EOF
+    
+    # Deploy DocumentDB instance with appropriate storage configuration
+    if [ "$CREATE_STORAGE_CLASS" = "true" ]; then
+        # Use custom Premium SSD storage class
+        kubectl apply -f - <<EOF
+apiVersion: db.microsoft.com/preview
+kind: DocumentDB
+metadata:
+  name: sample-documentdb
+  namespace: documentdb-instance-ns
+spec:
+  environment: aks
+  nodeCount: 1
+  instancesPerNode: 1
+  documentDBImage: ghcr.io/microsoft/documentdb/documentdb-local:16
+  gatewayImage: ghcr.io/microsoft/documentdb/documentdb-local:16
+  documentDbCredentialSecret: documentdb-credentials
+  resource:
+    storage:
+      pvcSize: 10Gi
+      storageClass: documentdb-storage  # Custom Premium SSD
+  exposeViaService:
+    serviceType: LoadBalancer
+  sidecarInjectorPluginName: cnpg-i-sidecar-injector.documentdb.io
+EOF
+    else
+        # Use AKS default storage (StandardSSD_LRS)
+        kubectl apply -f - <<EOF
+apiVersion: db.microsoft.com/preview
+kind: DocumentDB
+metadata:
+  name: sample-documentdb
+  namespace: documentdb-instance-ns
+spec:
+  environment: aks
+  nodeCount: 1
+  instancesPerNode: 1
+  documentDBImage: ghcr.io/microsoft/documentdb/documentdb-local:16
+  gatewayImage: ghcr.io/microsoft/documentdb/documentdb-local:16
+  documentDbCredentialSecret: documentdb-credentials
+  resource:
+    storage:
+      pvcSize: 10Gi
+      # storageClass omitted - uses AKS default (StandardSSD_LRS)
+  exposeViaService:
+    serviceType: LoadBalancer
+  sidecarInjectorPluginName: cnpg-i-sidecar-injector.documentdb.io
+EOF
+    fi
+    
+    # Wait for DocumentDB to be ready
+    log "Waiting for DocumentDB instance to be ready (this may take several minutes)..."
+    kubectl wait --for=condition=ready documentdb sample-documentdb --timeout=600s || warn "DocumentDB instance may still be starting"
+    
+    success "DocumentDB instance deployed"
+    
+    # Show connection info
+    log "DocumentDB instance connection information:"
+    kubectl get documentdb sample-documentdb -o wide
+    
+    log ""
+    log "🔍 To monitor the service and get the external IP:"
+    log "  kubectl get service -n documentdb-instance-ns"
+    log ""
+    log "📝 Note: It takes 2-5 minutes for Azure to provision the LoadBalancer and assign a public IP"
+    log "📝 Azure LoadBalancer annotations are automatically applied by the operator based on environment: aks"
+}
+
+# Install OpenTelemetry Operator (infrastructure only)
+install_opentelemetry_operator() {
+    log "Installing OpenTelemetry Operator (infrastructure component)..."
+    
+    # Check if already installed
+    if kubectl get deployment opentelemetry-operator-controller-manager -n opentelemetry-operator-system &> /dev/null; then
+        warn "OpenTelemetry Operator already installed. Skipping installation."
+        return 0
+    fi
+    
+    # Install OpenTelemetry Operator
+    log "Installing OpenTelemetry Operator from upstream..."
+    kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
+    
+    # Wait for operator to be ready
+    log "Waiting for OpenTelemetry Operator to be ready..."
+    kubectl wait --for=condition=available deployment/opentelemetry-operator-controller-manager -n opentelemetry-operator-system --timeout=300s || warn "OpenTelemetry Operator may still be starting"
+    
+    success "OpenTelemetry Operator installed (ready for multi-tenant collectors)"
+}
+
+# Print summary
+print_summary() {
+    echo ""
+    echo "=================================================="
+    echo "🎉 AKS CLUSTER SETUP COMPLETE!"
+    echo "=================================================="
+    echo "Cluster Name: $CLUSTER_NAME"
+    echo "Resource Group: $RESOURCE_GROUP"
+    echo "Location: $LOCATION"
+    echo "Operator Installed: $INSTALL_OPERATOR"
+    echo "Instance Deployed: $DEPLOY_INSTANCE"
+    echo "OpenTelemetry Operator: Installed"
+    echo "Custom Storage Class: $CREATE_STORAGE_CLASS"
+    echo ""
+    echo "✅ Components installed:"
+    echo "  - AKS cluster with managed nodes"
+    echo "  - Azure CSI drivers (Disk & File)"
+    echo "  - Azure Load Balancer (built-in)"
+    echo "  - cert-manager"
+    if [ "$CREATE_STORAGE_CLASS" == "true" ]; then
+        echo "  - DocumentDB Premium SSD storage class"
+    else
+        echo "  - Using AKS default StandardSSD_LRS storage"
+    fi
+    if [ "$INSTALL_OPERATOR" == "true" ]; then
+        echo "  - DocumentDB operator"
+    fi
+    if [ "$DEPLOY_INSTANCE" == "true" ]; then
+        echo "  - DocumentDB instance (sample-documentdb)"
+    fi
+    echo "  - OpenTelemetry Operator (for multi-tenant collectors)"
+    echo ""
+    echo "💡 Next steps:"
+    echo "  - Verify cluster: kubectl get nodes"
+    echo "  - Check all pods: kubectl get pods --all-namespaces"
+    if [ "$INSTALL_OPERATOR" == "true" ]; then
+        echo "  - Check operator: kubectl get pods -n documentdb-operator"
+    fi
+    if [ "$DEPLOY_INSTANCE" == "true" ]; then
+        echo "  - Check DocumentDB: kubectl get documentdb -n documentdb-instance-ns"
+        echo "  - Check service status: kubectl get svc -n documentdb-instance-ns"
+        echo "  - Wait for LoadBalancer IP: kubectl get svc documentdb-service-sample-documentdb -n documentdb-instance-ns -w"
+        echo "  - Once IP is assigned, connect: mongodb://docdbadmin:SecurePassword123!@<EXTERNAL-IP>:10260/"
+    fi
+    if [ "$ENABLE_TELEMETRY" == "true" ]; then
+        echo "  - Check telemetry: kubectl get pods -n documentdb-telemetry"
+        echo "  - Access Grafana: kubectl port-forward -n documentdb-telemetry svc/grafana 3000:80"
+        echo "  - Access Prometheus: kubectl port-forward -n documentdb-telemetry svc/prometheus-server 9090:80"
+        echo "  - Grafana login: admin / admin123"
+    fi
+    echo ""
+    echo "⚠️  IMPORTANT: Run './delete-cluster.sh' when done to avoid Azure charges!"
+    echo "=================================================="
+}
+
+# Main execution
+main() {
+    log "Starting DocumentDB AKS cluster setup..."
+    log "Configuration:"
+    log "  Cluster: $CLUSTER_NAME"
+    log "  Resource Group: $RESOURCE_GROUP"
+    log "  Location: $LOCATION"
+    log "  Install Operator: $INSTALL_OPERATOR"
+    log "  Deploy Instance: $DEPLOY_INSTANCE"
+    log "  Enable Telemetry: $ENABLE_TELEMETRY"
+    if [ ! -z "$GITHUB_USERNAME" ]; then
+        log "  GitHub Username: $GITHUB_USERNAME"
+        log "  GitHub Token: ${GITHUB_TOKEN:+***provided***}"
+    fi
+    echo ""
+    
+    # Validate GitHub credentials if operator installation is requested
+    if [ "$INSTALL_OPERATOR" == "true" ] && ([ -z "$GITHUB_TOKEN" ] || [ -z "$GITHUB_USERNAME" ]); then
+        error "DocumentDB operator installation requires GitHub authentication.
+
+GitHub credentials can be provided via:
+
+1. Environment variables (recommended):
+   export GITHUB_USERNAME=<your-username>
+   export GITHUB_TOKEN=<your-token>
+
+2. Command line arguments:
+   --github-username <your-username> --github-token <your-token>
+
+Example with command line:
+  $0 --install-operator --github-username myuser --github-token ghp_xxxxxxxxxxxx
+
+To create a GitHub token:
+1. Go to https://github.com/settings/tokens
+2. Generate a new token with 'read:packages' scope
+3. Set via environment variables or command line arguments"
+    fi
+    
+    check_prerequisites
+    
+    # Simple logic based on parameters
+    if [ "$INSTALL_OPERATOR" == "true" ] && [ "$DEPLOY_INSTANCE" != "true" ]; then
+        # Case 1: --install-operator only
+        log "🔧 Installing operator only (assumes cluster exists)"
+        setup_kubeconfig
+        install_documentdb_operator
+        
+    elif [ "$DEPLOY_INSTANCE" == "true" ] && [ "$INSTALL_OPERATOR" != "true" ]; then
+        # Case 2: --deploy-instance only  
+        log "🚀 Deploying DocumentDB instance only (assumes cluster+operator exist)"
+        setup_kubeconfig
+        deploy_documentdb_instance
+        
+    elif [ "$INSTALL_OPERATOR" == "true" ] && [ "$DEPLOY_INSTANCE" == "true" ]; then
+        # Case 3: --install-all (both flags set)
+        log "🎯 Installing everything: cluster + operator + instance"
+        setup_cluster_infrastructure
+        install_documentdb_operator
+        deploy_documentdb_instance
+        
+    else
+        # Case 4: No flags - create cluster only
+        log "🏗️  Creating cluster only (no operator, no instance)"
+        setup_cluster_infrastructure
+    fi
+    
+    # Always install OpenTelemetry Operator (infrastructure component for multi-tenant collectors)
+    log "📊 Installing OpenTelemetry Operator (infrastructure)..."
+    setup_kubeconfig  # Ensure we have cluster access  
+    install_opentelemetry_operator
+    
+    print_summary
+}
+
+# Helper function to set up cluster infrastructure
+setup_cluster_infrastructure() {
+    # Check if cluster already exists
+    CLUSTER_EXISTS=$(az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --query "name" -o tsv 2>/dev/null)
+    
+    if [ "$CLUSTER_EXISTS" == "$CLUSTER_NAME" ]; then
+        log "✅ Cluster $CLUSTER_NAME already exists, skipping infrastructure setup"
+        setup_kubeconfig
+    else
+        log "Creating new cluster and infrastructure..."
+        create_resource_group
+        create_cluster
+        install_azure_csi_drivers
+        configure_load_balancer
+        install_cert_manager
+        create_storage_class
+    fi
+}
+
+# Helper function to set up kubeconfig
+setup_kubeconfig() {
+    # Verify cluster exists
+    if ! az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME >/dev/null 2>&1; then
+        error "Cluster $CLUSTER_NAME not found. Create cluster first."
+    fi
+    
+    # Get cluster credentials
+    log "Getting cluster credentials..."
+    az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --overwrite-existing
+    
+    # Handle WSL case
+    if grep -qi microsoft /proc/version 2>/dev/null; then
+        log "Detected WSL environment, copying kubeconfig from Windows to WSL..."
+        WIN_KUBE_CONFIG="/mnt/c/Users/$(whoami)/.kube/config"
+        if [ -f "$WIN_KUBE_CONFIG" ]; then
+            mkdir -p ~/.kube
+            cp "$WIN_KUBE_CONFIG" ~/.kube/config
+            chmod 600 ~/.kube/config
+            log "Kubeconfig copied to WSL"
+        fi
+    fi
+    
+    success "Cluster credentials configured"
+}
+
+# Run main function
+main "$@"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/scripts/delete-cluster.sh b/documentdb-playground/telemetry/scripts/delete-cluster.sh
new file mode 100755
index 00000000..72cdd379
--- /dev/null
+++ b/documentdb-playground/telemetry/scripts/delete-cluster.sh
@@ -0,0 +1,407 @@
+#!/bin/bash
+
+# DocumentDB AKS Cluster Deletion Script
+# This script comprehensively deletes the AKS cluster and all associated Azure resources
+
+set -e  # Exit on any error
+
+# Configuration (should match create-cluster.sh)
+CLUSTER_NAME="ray-ddb-cluster"
+RESOURCE_GROUP="ray-documentdb-rg"
+LOCATION="West US 2"
+
+# Deletion scope flags
+DELETE_INSTANCE="${DELETE_INSTANCE:-false}"
+DELETE_OPERATOR="${DELETE_OPERATOR:-false}"
+DELETE_CLUSTER="${DELETE_CLUSTER:-false}"
+DELETE_ALL="${DELETE_ALL:-false}"
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --cluster-name)
+            CLUSTER_NAME="$2"
+            shift 2
+            ;;
+        --resource-group)
+            RESOURCE_GROUP="$2"
+            shift 2
+            ;;
+        --delete-instance)
+            DELETE_INSTANCE="true"
+            shift
+            ;;
+        --delete-operator)
+            DELETE_OPERATOR="true"
+            shift
+            ;;
+        --delete-cluster)
+            DELETE_CLUSTER="true"
+            shift
+            ;;
+        --delete-all)
+            DELETE_ALL="true"
+            DELETE_INSTANCE="true"
+            DELETE_OPERATOR="true"
+            DELETE_CLUSTER="true"
+            shift
+            ;;
+        --force)
+            FORCE_DELETE="true"
+            shift
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  --delete-instance       Delete DocumentDB instance only"
+            echo "  --delete-operator       Delete DocumentDB operator only"
+            echo "  --delete-cluster        Delete AKS cluster only"
+            echo "  --delete-all           Delete everything (instance + operator + cluster)"
+            echo "  --cluster-name NAME     AKS cluster name (default: ray-ddb-cluster)"
+            echo "  --resource-group RG     Azure resource group (default: ray-documentdb-rg)"
+            echo "  --force                 Skip confirmation prompts"
+            echo "  -h, --help             Show this help message"
+            echo ""
+            echo "Examples:"
+            echo "  $0 --delete-instance               # Delete DocumentDB instance only"
+            echo "  $0 --delete-operator               # Delete operator only"
+            echo "  $0 --delete-cluster                # Delete cluster only"
+            echo "  $0 --delete-all                    # Delete everything"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Logging function
+log() {
+    echo -e "${BLUE}[$(date +'%Y-%m-%d %H:%M:%S')]${NC} $1"
+}
+
+success() {
+    echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] ✅ $1${NC}"
+}
+
+warn() {
+    echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] ⚠️  $1${NC}"
+}
+
+error() {
+    echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ❌ $1${NC}"
+}
+
+# Check prerequisites
+check_prerequisites() {
+    log "Checking prerequisites..."
+    
+    # Check Azure CLI
+    if ! command -v az &> /dev/null; then
+        error "Azure CLI not found. Cannot proceed with deletion."
+        exit 1
+    fi
+    
+    # Check kubectl
+    if ! command -v kubectl &> /dev/null; then
+        warn "kubectl not found. Some cleanup steps may be skipped."
+    fi
+    
+    # Check Azure login
+    if ! az account show &> /dev/null; then
+        error "Not logged into Azure. Please run 'az login' first."
+        exit 1
+    fi
+    
+    success "Prerequisites met"
+}
+
+# Confirmation prompt
+confirm_deletion() {
+    if [ "$FORCE_DELETE" == "true" ]; then
+        return 0
+    fi
+    
+    echo ""
+    echo "⚠️  WARNING: This will permanently delete the following resources:"
+    
+    if [ "$DELETE_INSTANCE" == "true" ]; then
+        echo "  - DocumentDB instances and namespaces"
+    fi
+    
+    if [ "$DELETE_OPERATOR" == "true" ]; then
+        echo "  - DocumentDB operator"
+    fi
+    
+    if [ "$DELETE_CLUSTER" == "true" ]; then
+        echo "  - AKS Cluster: $CLUSTER_NAME"
+        echo "  - Resource Group: $RESOURCE_GROUP (and ALL resources within it)"
+        echo "  - All associated Azure resources (LoadBalancers, Disks, Network Security Groups, etc.)"
+        echo ""
+        echo "💰 This action will stop all Azure charges for these resources."
+    fi
+    
+    echo ""
+    read -p "Are you sure you want to proceed? Type 'yes' to confirm: " confirmation
+    
+    if [ "$confirmation" != "yes" ]; then
+        echo "Deletion cancelled."
+        exit 0
+    fi
+}
+
+# Delete DocumentDB instances (legacy single-tenant only)
+delete_documentdb_instances() {
+    log "Deleting legacy DocumentDB instances..."
+    
+    if command -v kubectl &> /dev/null && kubectl cluster-info &> /dev/null; then
+        # Delete legacy DocumentDB instances (single-tenant setup)
+        kubectl delete documentdb --all -n documentdb-instance-ns --ignore-not-found=true || warn "No legacy DocumentDB instances found"
+        
+        # Delete legacy DocumentDB namespace 
+        kubectl delete namespace documentdb-instance-ns --ignore-not-found=true || warn "Legacy DocumentDB namespace not found"
+        
+        warn "⚠️  For multi-tenant DocumentDB cleanup, use: ./delete-multi-tenant-telemetry.sh"
+        success "Legacy DocumentDB instances cleanup completed"
+    else
+        warn "kubectl not available or cluster not accessible. Skipping DocumentDB cleanup."
+    fi
+}
+
+# Delete DocumentDB operator
+delete_documentdb_operator() {
+    log "Deleting DocumentDB operator..."
+    
+    if command -v kubectl &> /dev/null && kubectl cluster-info &> /dev/null; then
+        # Delete operator using Helm if available
+        if command -v helm &> /dev/null; then
+            helm uninstall documentdb-operator -n documentdb-operator --ignore-not-found 2>/dev/null || warn "DocumentDB operator Helm release not found"
+        fi
+        
+        # Delete operator namespace
+        kubectl delete namespace documentdb-operator --ignore-not-found=true || warn "Failed to delete DocumentDB operator namespace"
+        
+        success "DocumentDB operator deleted"
+    else
+        warn "kubectl not available or cluster not accessible. Skipping operator cleanup."
+    fi
+}
+
+# Delete cert-manager
+delete_cert_manager() {
+    log "Deleting cert-manager..."
+    
+    if command -v kubectl &> /dev/null && kubectl cluster-info &> /dev/null && command -v helm &> /dev/null; then
+        helm uninstall cert-manager -n cert-manager --ignore-not-found 2>/dev/null || warn "cert-manager Helm release not found"
+        kubectl delete namespace cert-manager --ignore-not-found=true || warn "Failed to delete cert-manager namespace"
+        success "cert-manager deleted"
+    else
+        warn "kubectl or helm not available. Skipping cert-manager cleanup."
+    fi
+}
+
+# Delete Load Balancer services
+delete_load_balancer_services() {
+    log "Deleting LoadBalancer services..."
+    
+    if command -v kubectl &> /dev/null && kubectl cluster-info &> /dev/null; then
+        # Delete all LoadBalancer services to trigger Azure LoadBalancer cleanup
+        kubectl get services --all-namespaces -o json | \
+            jq -r '.items[] | select(.spec.type=="LoadBalancer") | "\(.metadata.namespace) \(.metadata.name)"' | \
+            while read namespace name; do
+                if [ -n "$namespace" ] && [ -n "$name" ]; then
+                    log "Deleting LoadBalancer service: $name in namespace: $namespace"
+                    kubectl delete service "$name" -n "$namespace" --ignore-not-found=true || warn "Failed to delete service $name"
+                fi
+            done 2>/dev/null || warn "Failed to query LoadBalancer services"
+        
+        # Wait a moment for Azure to process the deletions
+        log "Waiting for Azure LoadBalancer cleanup..."
+        sleep 30
+        
+        success "LoadBalancer services deleted"
+    else
+        warn "kubectl not available. Skipping LoadBalancer service cleanup."
+    fi
+}
+
+# Delete AKS cluster
+delete_aks_cluster() {
+    log "Deleting AKS cluster: $CLUSTER_NAME"
+    
+    # Check if cluster exists
+    if ! az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME &> /dev/null; then
+        warn "AKS cluster $CLUSTER_NAME not found. Skipping cluster deletion."
+        return 0
+    fi
+    
+    # Delete the AKS cluster
+    log "This may take 10-15 minutes..."
+    az aks delete --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --yes --no-wait
+    
+    # Wait for deletion to complete
+    log "Waiting for AKS cluster deletion to complete..."
+    while az aks show --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME &> /dev/null; do
+        log "Cluster still exists, waiting..."
+        sleep 30
+    done
+    
+    success "AKS cluster deleted"
+}
+
+# Delete resource group and all resources
+delete_resource_group() {
+    log "Deleting resource group: $RESOURCE_GROUP"
+    
+    # Check if resource group exists
+    if ! az group show --name $RESOURCE_GROUP &> /dev/null; then
+        warn "Resource group $RESOURCE_GROUP not found. Skipping resource group deletion."
+        return 0
+    fi
+    
+    # Delete the entire resource group (this removes all resources within it)
+    log "This may take 10-20 minutes..."
+    az group delete --name $RESOURCE_GROUP --yes --no-wait
+    
+    # Wait for deletion to complete
+    log "Waiting for resource group deletion to complete..."
+    while az group show --name $RESOURCE_GROUP &> /dev/null; do
+        log "Resource group still exists, waiting..."
+        sleep 60
+    done
+    
+    success "Resource group deleted"
+}
+
+# Clean up local kubectl context
+cleanup_kubectl_context() {
+    log "Cleaning up local kubectl context..."
+    
+    if command -v kubectl &> /dev/null; then
+        # Remove the cluster context
+        kubectl config delete-context "$CLUSTER_NAME" 2>/dev/null || warn "kubectl context not found"
+        kubectl config delete-cluster "$CLUSTER_NAME" 2>/dev/null || warn "kubectl cluster config not found"
+        kubectl config unset "users.clusterUser_${RESOURCE_GROUP}_${CLUSTER_NAME}" 2>/dev/null || warn "kubectl user config not found"
+        
+        success "kubectl context cleaned up"
+    else
+        warn "kubectl not available. Skipping kubectl context cleanup."
+    fi
+}
+
+# Verify cleanup
+verify_cleanup() {
+    log "Verifying cleanup..."
+    
+    # Check if resource group still exists
+    if az group show --name $RESOURCE_GROUP &> /dev/null; then
+        error "Resource group $RESOURCE_GROUP still exists. Manual cleanup may be required."
+        return 1
+    fi
+    
+    success "✅ All Azure resources have been successfully deleted"
+    success "✅ No Azure charges should be incurred for these resources"
+}
+
+# Print summary
+print_summary() {
+    echo ""
+    echo "=================================================="
+    echo "🗑️  SELECTIVE DELETION COMPLETE!"
+    echo "=================================================="
+    echo "Deleted Resources:"
+    
+    if [ "$DELETE_INSTANCE" == "true" ]; then
+        echo "  - DocumentDB instances and namespaces"
+    fi
+    
+    if [ "$DELETE_OPERATOR" == "true" ]; then
+        echo "  - DocumentDB operator"
+    fi
+    
+    if [ "$DELETE_CLUSTER" == "true" ]; then
+        echo "  - AKS Cluster: $CLUSTER_NAME"
+        echo "  - Resource Group: $RESOURCE_GROUP"
+        echo "  - All associated Azure resources"
+    fi
+    
+    echo ""
+    echo "✅ Cleanup completed successfully"
+    
+    if [ "$DELETE_CLUSTER" == "true" ]; then
+        echo "✅ All Azure charges for these resources have been stopped"
+        echo ""
+        echo "💡 If you need to recreate the cluster:"
+        echo "  ./create-cluster.sh --install-all"
+    else
+        echo ""
+        echo "💡 Next steps based on what's still running:"
+        if [ "$DELETE_INSTANCE" == "true" ] && [ "$DELETE_OPERATOR" == "false" ]; then
+            echo "  - Deploy new instance: ./create-cluster.sh --deploy-instance"
+        fi
+        if [ "$DELETE_OPERATOR" == "true" ] && [ "$DELETE_CLUSTER" == "false" ]; then
+            echo "  - Install operator: ./create-cluster.sh --install-operator"
+            echo "  - Deploy instance: ./create-cluster.sh --deploy-instance"
+        fi
+    fi
+    echo "=================================================="
+}
+
+# Main execution
+main() {
+    log "Starting DocumentDB AKS selective deletion..."
+    log "Target cluster: $CLUSTER_NAME in resource group: $RESOURCE_GROUP"
+    log "Deletion scope:"
+    log "  Instance: $DELETE_INSTANCE"
+    log "  Operator: $DELETE_OPERATOR" 
+    log "  Cluster: $DELETE_CLUSTER"
+    echo ""
+    
+    # Check if any deletion flag is set
+    if [ "$DELETE_INSTANCE" != "true" ] && [ "$DELETE_OPERATOR" != "true" ] && [ "$DELETE_CLUSTER" != "true" ]; then
+        error "No deletion scope specified. Use --delete-instance, --delete-operator, --delete-cluster, or --delete-all"
+        exit 1
+    fi
+    
+    # Execute deletion steps
+    check_prerequisites
+    confirm_deletion
+    
+    log "🗑️  Beginning selective deletion process..."
+    
+    # Selective deletion based on flags
+    if [ "$DELETE_INSTANCE" == "true" ]; then
+        delete_documentdb_instances
+    fi
+    
+    if [ "$DELETE_OPERATOR" == "true" ]; then
+        delete_documentdb_operator
+    fi
+    
+    if [ "$DELETE_CLUSTER" == "true" ]; then
+        delete_cert_manager
+        delete_load_balancer_services
+        delete_aks_cluster
+        delete_resource_group
+        cleanup_kubectl_context
+        verify_cleanup
+    fi
+    
+    # Show summary
+    print_summary
+}
+
+# Handle script interruption
+trap 'echo -e "\n${RED}Script interrupted. Some resources may not have been deleted.${NC}"; exit 1' INT
+
+# Run main function
+main "$@"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/scripts/delete-multi-tenant-telemetry.sh b/documentdb-playground/telemetry/scripts/delete-multi-tenant-telemetry.sh
new file mode 100755
index 00000000..80682c98
--- /dev/null
+++ b/documentdb-playground/telemetry/scripts/delete-multi-tenant-telemetry.sh
@@ -0,0 +1,378 @@
+#!/bin/bash
+
+# Multi-Tenant DocumentDB + Telemetry Cleanup Script
+# This script removes all multi-tenant DocumentDB applications and monitoring stack
+
+set -e
+
+# Configuration
+TEAMS=("sales" "accounts")
+NAMESPACES=("sales-namespace" "accounts-namespace")
+
+# Cleanup scope flags  
+DELETE_DOCUMENTDB="${DELETE_DOCUMENTDB:-false}"
+DELETE_COLLECTORS="${DELETE_COLLECTORS:-false}"
+DELETE_MONITORING="${DELETE_MONITORING:-false}" 
+DELETE_NAMESPACES="${DELETE_NAMESPACES:-false}"
+DELETE_ALL="${DELETE_ALL:-false}"
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --delete-documentdb)
+            DELETE_DOCUMENTDB="true"
+            shift
+            ;;
+        --delete-collectors)
+            DELETE_COLLECTORS="true"  
+            shift
+            ;;
+        --delete-monitoring)
+            DELETE_MONITORING="true"
+            shift
+            ;;
+        --delete-namespaces)
+            DELETE_NAMESPACES="true"
+            shift
+            ;;
+        --delete-all)
+            DELETE_ALL="true"
+            DELETE_DOCUMENTDB="true"
+            DELETE_COLLECTORS="true"
+            DELETE_MONITORING="true"
+            DELETE_NAMESPACES="true"
+            shift
+            ;;
+        --force)
+            FORCE_DELETE="true"
+            shift
+            ;;
+        -h|--help)
+            echo "Usage: $0 [OPTIONS]"
+            echo ""
+            echo "Multi-tenant DocumentDB and telemetry cleanup script"
+            echo ""
+            echo "Options:"
+            echo "  --delete-documentdb     Delete DocumentDB clusters only"
+            echo "  --delete-collectors     Delete OpenTelemetry collectors only"
+            echo "  --delete-monitoring     Delete Prometheus/Grafana monitoring only"
+            echo "  --delete-namespaces     Delete team namespaces (includes all above)"
+            echo "  --delete-all            Delete everything (DocumentDB + collectors + monitoring + namespaces)"
+            echo "  --force                 Skip confirmation prompts"
+            echo "  -h, --help             Show this help message"
+            echo ""
+            echo "Examples:"
+            echo "  $0 --delete-all                    # Remove everything"
+            echo "  $0 --delete-documentdb             # Remove only DocumentDB clusters"  
+            echo "  $0 --delete-monitoring             # Remove only Prometheus/Grafana"
+            echo "  $0 --delete-all --force            # Remove everything without confirmation"
+            echo ""
+            echo "Affected namespaces: ${NAMESPACES[*]}"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            echo "Use --help for usage information"
+            exit 1
+            ;;
+    esac
+done
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Logging functions
+log() {
+    echo -e "${BLUE}[$(date +'%Y-%m-%d %H:%M:%S')]${NC} $1"
+}
+
+success() {
+    echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] ✅ $1${NC}"
+}
+
+warn() {
+    echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] ⚠️  $1${NC}"
+}
+
+error() {
+    echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ❌ $1${NC}"
+    exit 1
+}
+
+# Check prerequisites
+check_prerequisites() {
+    log "Checking prerequisites..."
+    
+    # Check kubectl
+    if ! command -v kubectl &> /dev/null; then
+        error "kubectl not found. Cannot proceed with cleanup."
+    fi
+    
+    # Check cluster access
+    if ! kubectl cluster-info &> /dev/null; then
+        error "Cannot access Kubernetes cluster. Please check your kubectl configuration."
+    fi
+    
+    # Check Helm
+    if ! command -v helm &> /dev/null; then
+        warn "Helm not found. Some monitoring cleanup may require manual intervention."
+    fi
+    
+    success "Prerequisites met"
+}
+
+# Confirmation prompt
+confirm_deletion() {
+    if [ "$FORCE_DELETE" == "true" ]; then
+        return 0
+    fi
+    
+    echo ""
+    echo "⚠️  WARNING: This will permanently delete the following multi-tenant resources:"
+    echo ""
+    
+    if [ "$DELETE_DOCUMENTDB" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        echo "📦 DocumentDB Clusters:"
+        for team in "${TEAMS[@]}"; do
+            echo "  - documentdb-$team (in ${team}-namespace)"
+        done
+    fi
+    
+    if [ "$DELETE_COLLECTORS" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        echo "🔧 OpenTelemetry Collectors:"
+        for team in "${TEAMS[@]}"; do
+            echo "  - documentdb-${team}-collector (in ${team}-namespace)"
+        done
+    fi
+    
+    if [ "$DELETE_MONITORING" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        echo "📊 Monitoring Stacks:"
+        for team in "${TEAMS[@]}"; do
+            echo "  - prometheus-$team (Helm release)"
+            echo "  - grafana-$team (Helm release)"
+        done
+    fi
+    
+    if [ "$DELETE_NAMESPACES" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        echo "🏠 Namespaces:"
+        for ns in "${NAMESPACES[@]}"; do
+            echo "  - $ns (and ALL resources within it)"
+        done
+    fi
+    
+    echo ""
+    echo "💡 This will NOT affect:"
+    echo "  - AKS cluster infrastructure"
+    echo "  - DocumentDB operator"
+    echo "  - OpenTelemetry operator" 
+    echo "  - Other namespaces"
+    echo ""
+    
+    read -p "Are you sure you want to proceed? (yes/no): " -r
+    if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
+        log "Operation cancelled by user"
+        exit 0
+    fi
+}
+
+# Delete DocumentDB clusters
+delete_documentdb_clusters() {
+    log "Deleting DocumentDB clusters..."
+    
+    for i in "${!TEAMS[@]}"; do
+        team="${TEAMS[$i]}"
+        namespace="${NAMESPACES[$i]}"
+        
+        log "Deleting DocumentDB cluster for team: $team"
+        
+        # Delete DocumentDB cluster
+        kubectl delete documentdb documentdb-$team -n $namespace --ignore-not-found=true || warn "DocumentDB cluster for $team not found or failed to delete"
+        
+        # Wait for cluster to be fully deleted
+        log "Waiting for DocumentDB cluster $team to be fully deleted..."
+        timeout=120
+        while kubectl get documentdb documentdb-$team -n $namespace &> /dev/null && [ $timeout -gt 0 ]; do
+            echo -n "."
+            sleep 2
+            timeout=$((timeout - 2))
+        done
+        echo ""
+        
+        if [ $timeout -le 0 ]; then
+            warn "Timeout waiting for DocumentDB cluster $team to be deleted"
+        else
+            success "DocumentDB cluster $team deleted successfully"
+        fi
+        
+        # Delete secrets and configmaps
+        kubectl delete secret documentdb-credentials -n $namespace --ignore-not-found=true || true
+        kubectl delete configmap --all -n $namespace --ignore-not-found=true || true
+    done
+    
+    success "DocumentDB clusters cleanup completed"
+}
+
+# Delete OpenTelemetry collectors
+delete_otel_collectors() {
+    log "Deleting OpenTelemetry collectors..."
+    
+    for i in "${!TEAMS[@]}"; do
+        team="${TEAMS[$i]}"
+        namespace="${NAMESPACES[$i]}"
+        
+        log "Deleting OpenTelemetry collector for team: $team"
+        
+        # Delete OpenTelemetry collector
+        kubectl delete otelcol documentdb-${team}-collector -n $namespace --ignore-not-found=true || warn "OpenTelemetry collector for $team not found"
+        
+        # Delete collector service account and RBAC
+        kubectl delete serviceaccount otel-collector-$team -n $namespace --ignore-not-found=true || true
+        kubectl delete clusterrolebinding otel-collector-$team --ignore-not-found=true || true
+    done
+    
+    success "OpenTelemetry collectors cleanup completed"
+}
+
+# Delete monitoring stack (Prometheus & Grafana)
+delete_monitoring_stack() {
+    log "Deleting monitoring stacks..."
+    
+    if ! command -v helm &> /dev/null; then
+        error "Helm is required to delete monitoring stack. Please install Helm or delete manually."
+    fi
+    
+    for team in "${TEAMS[@]}"; do
+        namespace="${team}-namespace"
+        
+        log "Deleting monitoring stack for team: $team"
+        
+        # Delete Grafana
+        log "Deleting Grafana for $team..."
+        helm uninstall grafana-$team -n $namespace --ignore-not-found 2>/dev/null || warn "Grafana release for $team not found"
+        
+        # Delete Prometheus  
+        log "Deleting Prometheus for $team..."
+        helm uninstall prometheus-$team -n $namespace --ignore-not-found 2>/dev/null || warn "Prometheus release for $team not found"
+        
+        # Wait for PVCs to be cleaned up (they may have finalizers)
+        log "Waiting for persistent volumes to be cleaned up..."
+        sleep 5
+        
+        # Force delete any remaining PVCs if they exist
+        kubectl delete pvc --all -n $namespace --ignore-not-found=true || true
+    done
+    
+    success "Monitoring stacks cleanup completed"
+}
+
+# Delete team namespaces
+delete_team_namespaces() {
+    log "Deleting team namespaces..."
+    
+    for namespace in "${NAMESPACES[@]}"; do
+        log "Deleting namespace: $namespace"
+        
+        # Delete namespace (this will delete all resources within it)
+        kubectl delete namespace $namespace --ignore-not-found=true || warn "Failed to delete namespace $namespace"
+        
+        # Wait for namespace to be fully deleted
+        log "Waiting for namespace $namespace to be fully deleted..."
+        timeout=120
+        while kubectl get namespace $namespace &> /dev/null && [ $timeout -gt 0 ]; do
+            echo -n "."
+            sleep 2
+            timeout=$((timeout - 2))
+        done
+        echo ""
+        
+        if [ $timeout -le 0 ]; then
+            warn "Timeout waiting for namespace $namespace to be deleted"
+        else
+            success "Namespace $namespace deleted successfully"
+        fi
+    done
+    
+    success "Team namespaces cleanup completed"
+}
+
+# Clean up cluster-wide resources specific to multi-tenant setup
+cleanup_cluster_resources() {
+    log "Cleaning up cluster-wide multi-tenant resources..."
+    
+    # Delete cluster roles and bindings for each team
+    for team in "${TEAMS[@]}"; do
+        kubectl delete clusterrole otel-collector-$team --ignore-not-found=true || true
+        kubectl delete clusterrolebinding otel-collector-$team --ignore-not-found=true || true
+    done
+    
+    success "Cluster-wide resources cleaned up"
+}
+
+# Main execution function
+main() {
+    log "Starting multi-tenant DocumentDB + telemetry cleanup..."
+    
+    check_prerequisites
+    
+    # If no specific flags are set, show help
+    if [ "$DELETE_DOCUMENTDB" != "true" ] && [ "$DELETE_COLLECTORS" != "true" ] && [ "$DELETE_MONITORING" != "true" ] && [ "$DELETE_NAMESPACES" != "true" ] && [ "$DELETE_ALL" != "true" ]; then
+        warn "No cleanup scope specified. Use --help to see available options."
+        echo ""
+        echo "Quick options:"
+        echo "  --delete-all        Delete everything"
+        echo "  --delete-documentdb Delete DocumentDB clusters only"
+        echo "  --help              Show full help"
+        exit 1
+    fi
+    
+    confirm_deletion
+    
+    # Execute cleanup in proper order
+    if [ "$DELETE_DOCUMENTDB" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        delete_documentdb_clusters
+    fi
+    
+    if [ "$DELETE_COLLECTORS" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        delete_otel_collectors
+    fi
+    
+    if [ "$DELETE_MONITORING" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        delete_monitoring_stack
+    fi
+    
+    if [ "$DELETE_NAMESPACES" == "true" ] || [ "$DELETE_ALL" == "true" ]; then
+        delete_team_namespaces
+    else
+        # Clean up cluster resources even if not deleting namespaces
+        cleanup_cluster_resources
+    fi
+    
+    # Summary
+    echo ""
+    echo "=================================================="
+    echo "🎉 MULTI-TENANT CLEANUP COMPLETE!"
+    echo "=================================================="
+    echo ""
+    echo "✅ Cleanup completed successfully"
+    echo ""
+    echo "💡 What was cleaned up:"
+    [ "$DELETE_DOCUMENTDB" == "true" ] || [ "$DELETE_ALL" == "true" ] && echo "  - DocumentDB clusters for teams: ${TEAMS[*]}"
+    [ "$DELETE_COLLECTORS" == "true" ] || [ "$DELETE_ALL" == "true" ] && echo "  - OpenTelemetry collectors for teams: ${TEAMS[*]}"
+    [ "$DELETE_MONITORING" == "true" ] || [ "$DELETE_ALL" == "true" ] && echo "  - Prometheus/Grafana monitoring stacks"
+    [ "$DELETE_NAMESPACES" == "true" ] || [ "$DELETE_ALL" == "true" ] && echo "  - Team namespaces: ${NAMESPACES[*]}"
+    echo ""
+    echo "🏗️  Infrastructure still available:"
+    echo "  - AKS cluster (use delete-cluster.sh to remove)"
+    echo "  - DocumentDB operator"
+    echo "  - OpenTelemetry operator"
+    echo ""
+    echo "🚀 Ready for new multi-tenant deployments!"
+    echo "   Use: ./deploy-multi-tenant-telemetry.sh"
+}
+
+# Run main function
+main "$@"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/scripts/deploy-multi-tenant-telemetry.sh b/documentdb-playground/telemetry/scripts/deploy-multi-tenant-telemetry.sh
new file mode 100755
index 00000000..ccfdce3a
--- /dev/null
+++ b/documentdb-playground/telemetry/scripts/deploy-multi-tenant-telemetry.sh
@@ -0,0 +1,551 @@
+#!/bin/bash
+
+# Multi-Tenant DocumentDB + Telemetry Deployment Script
+# This script deploys complete DocumentDB clusters with isolated monitoring stacks for different teams
+
+set -e
+
+# Configuration
+SALES_NAMESPACE="sales-namespace"
+ACCOUNTS_NAMESPACE="accounts-namespace"
+TELEMETRY_NAMESPACE="documentdb-telemetry"
+
+# Deployment options
+DEPLOY_DOCUMENTDB=true
+DEPLOY_TELEMETRY=true
+SKIP_WAIT=false
+
+# Parse command line arguments
+usage() {
+    echo "Usage: $0 [OPTIONS]"
+    echo ""
+    echo "Options:"
+    echo "  --telemetry-only    Deploy only telemetry stack (skip DocumentDB)"
+    echo "  --documentdb-only   Deploy only DocumentDB (skip telemetry)"
+    echo "  --skip-wait         Skip waiting for deployments to be ready"
+    echo "  --help              Show this help message"
+    echo ""
+    echo "Examples:"
+    echo "  $0                    # Deploy everything (DocumentDB + Telemetry)"
+    echo "  $0 --telemetry-only   # Deploy only collectors, Prometheus, Grafana"
+    echo "  $0 --documentdb-only  # Deploy only DocumentDB clusters"
+}
+
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --telemetry-only)
+            DEPLOY_DOCUMENTDB=false
+            shift
+            ;;
+        --documentdb-only)
+            DEPLOY_TELEMETRY=false
+            shift
+            ;;
+        --skip-wait)
+            SKIP_WAIT=true
+            shift
+            ;;
+        --help)
+            usage
+            exit 0
+            ;;
+        *)
+            error "Unknown option: $1"
+            usage
+            exit 1
+            ;;
+    esac
+done
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+log() {
+    echo -e "${BLUE}[$(date +'%Y-%m-%d %H:%M:%S')]${NC} $1"
+}
+
+success() {
+    echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] ✅${NC} $1"
+}
+
+warn() {
+    echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] ⚠️${NC} $1"
+}
+
+error() {
+    echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ❌${NC} $1"
+    exit 1
+}
+
+# Check if OpenTelemetry Operator is installed
+check_prerequisites() {
+    log "Checking prerequisites..."
+    
+    if ! kubectl get namespace opentelemetry-operator-system > /dev/null 2>&1; then
+        error "OpenTelemetry Operator is not installed. Please install it first."
+    fi
+    
+    if ! helm version > /dev/null 2>&1; then
+        error "Helm is not installed. Please install Helm first."
+    fi
+    
+    # Add Prometheus Helm repo if not already added
+    if ! helm repo list | grep -q prometheus-community; then
+        log "Adding Prometheus Helm repository..."
+        helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+        helm repo update
+    fi
+    
+    # Add Grafana Helm repo if not already added
+    if ! helm repo list | grep -q grafana; then
+        log "Adding Grafana Helm repository..."
+        helm repo add grafana https://grafana.github.io/helm-charts
+        helm repo update
+    fi
+    
+    success "Prerequisites check completed"
+}
+
+# Create namespaces for teams
+create_namespaces() {
+    log "Creating team namespaces..."
+    
+    # Sales namespace
+    if ! kubectl get namespace $SALES_NAMESPACE > /dev/null 2>&1; then
+        kubectl create namespace $SALES_NAMESPACE
+        kubectl label namespace $SALES_NAMESPACE team=sales
+        success "Created sales namespace: $SALES_NAMESPACE"
+    else
+        log "Sales namespace already exists: $SALES_NAMESPACE"
+    fi
+    
+    # Accounts namespace  
+    if ! kubectl get namespace $ACCOUNTS_NAMESPACE > /dev/null 2>&1; then
+        kubectl create namespace $ACCOUNTS_NAMESPACE
+        kubectl label namespace $ACCOUNTS_NAMESPACE team=accounts
+        success "Created accounts namespace: $ACCOUNTS_NAMESPACE"
+    else
+        log "Accounts namespace already exists: $ACCOUNTS_NAMESPACE"
+    fi
+}
+
+# Deploy Prometheus for a namespace
+deploy_prometheus() {
+    local namespace=$1
+    local team=$2
+    
+    log "Deploying Prometheus for $team team in namespace: $namespace"
+    
+    helm upgrade --install prometheus-$team prometheus-community/prometheus \
+        --namespace $namespace \
+        --set server.persistentVolume.size=10Gi \
+        --set server.retention=15d \
+        --set server.global.scrape_interval=15s \
+        --set server.global.evaluation_interval=15s \
+        --set alertmanager.enabled=false \
+        --set prometheus-node-exporter.enabled=false \
+        --set prometheus-pushgateway.enabled=false \
+        --set kube-state-metrics.enabled=false \
+        --set server.service.type=ClusterIP \
+        --set server.ingress.enabled=false \
+        --wait --timeout=300s
+    
+    success "Prometheus deployed for $team team"
+}
+
+# Deploy Grafana for a namespace
+deploy_grafana() {
+    local namespace=$1
+    local team=$2
+    local prometheus_url="http://prometheus-$team-server.$namespace.svc.cluster.local"
+    
+    log "Deploying Grafana for $team team in namespace: $namespace"
+    
+    # Create Grafana values for this team
+    cat > /tmp/grafana-$team-values.yaml <<EOF
+datasources:
+  datasources.yaml:
+    apiVersion: 1
+    datasources:
+    - name: Prometheus-$team
+      type: prometheus
+      url: $prometheus_url
+      access: proxy
+      isDefault: true
+      
+adminPassword: admin123
+
+service:
+  type: ClusterIP
+  port: 3000
+
+ingress:
+  enabled: false
+
+persistence:
+  enabled: true
+  size: 1Gi
+
+dashboardProviders:
+  dashboardproviders.yaml:
+    apiVersion: 1
+    providers:
+    - name: 'default'
+      orgId: 1
+      folder: ''
+      type: file
+      disableDeletion: false
+      editable: true
+      options:
+        path: /var/lib/grafana/dashboards/default
+
+dashboards:
+  default:
+    documentdb-overview:
+      json: |
+        {
+          "dashboard": {
+            "id": null,
+            "title": "DocumentDB Overview - $team Team",
+            "tags": ["documentdb", "$team"],
+            "timezone": "browser",
+            "panels": [
+              {
+                "id": 1,
+                "title": "CPU Usage",
+                "type": "graph",
+                "targets": [
+                  {
+                    "expr": "rate(container_cpu_usage_seconds_total{tenant=\"$team\",container!=\"POD\",container!=\"\",name!=\"\"}[5m]) * 100",
+                    "legendFormat": "{{pod}} - {{container}}"
+                  }
+                ],
+                "gridPos": {"h": 9, "w": 12, "x": 0, "y": 0},
+                "yAxes": [{"unit": "percent"}]
+              },
+              {
+                "id": 2, 
+                "title": "Memory Usage",
+                "type": "graph",
+                "targets": [
+                  {
+                    "expr": "container_memory_usage_bytes{tenant=\"$team\",container!=\"POD\",container!=\"\",name!=\"\"} / 1024 / 1024",
+                    "legendFormat": "{{pod}} - {{container}}"
+                  }
+                ],
+                "gridPos": {"h": 9, "w": 12, "x": 12, "y": 0},
+                "yAxes": [{"unit": "bytes"}]
+              },
+              {
+                "id": 3,
+                "title": "Pod Status",
+                "type": "stat",
+                "targets": [
+                  {
+                    "expr": "count(container_memory_usage_bytes{tenant=\"$team\",container!=\"POD\",container!=\"\",name!=\"\"})",
+                    "legendFormat": "Running Containers"
+                  }
+                ],
+                "gridPos": {"h": 6, "w": 12, "x": 0, "y": 9}
+              },
+              {
+                "id": 4, 
+                "title": "Network I/O",
+                "type": "graph",
+                "targets": [
+                  {
+                    "expr": "rate(container_network_receive_bytes_total{tenant=\"$team\"}[5m])",
+                    "legendFormat": "{{pod}} RX"
+                  },
+                  {
+                    "expr": "rate(container_network_transmit_bytes_total{tenant=\"$team\"}[5m])",
+                    "legendFormat": "{{pod}} TX"
+                  }
+                ],
+                "gridPos": {"h": 6, "w": 12, "x": 12, "y": 9}
+              }
+            ],
+            "time": {"from": "now-1h", "to": "now"},
+            "refresh": "30s"
+          }
+        }
+EOF
+    
+    helm upgrade --install grafana-$team grafana/grafana \
+        --namespace $namespace \
+        --values /tmp/grafana-$team-values.yaml \
+        --wait --timeout=300s
+    
+    # Clean up temp file
+    rm -f /tmp/grafana-$team-values.yaml
+    
+    success "Grafana deployed for $team team"
+}
+
+# Deploy OpenTelemetry collectors for each team
+deploy_collectors() {
+    log "Deploying multi-tenant OpenTelemetry collectors..."
+    
+    # Get the directory where this script is located
+    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
+    TELEMETRY_DIR="$(dirname "$SCRIPT_DIR")"
+    
+    # Deploy Sales collector
+    if [ -f "$TELEMETRY_DIR/otel-collector-sales.yaml" ]; then
+        log "Deploying Sales team OpenTelemetry Collector..."
+        kubectl apply -f "$TELEMETRY_DIR/otel-collector-sales.yaml"
+        success "Sales collector deployed"
+    else
+        error "Sales collector configuration not found: $TELEMETRY_DIR/otel-collector-sales.yaml"
+    fi
+    
+    # Deploy Accounts collector
+    if [ -f "$TELEMETRY_DIR/otel-collector-accounts.yaml" ]; then
+        log "Deploying Accounts team OpenTelemetry Collector..."
+        kubectl apply -f "$TELEMETRY_DIR/otel-collector-accounts.yaml"
+        success "Accounts collector deployed"
+    else
+        error "Accounts collector configuration not found: $TELEMETRY_DIR/otel-collector-accounts.yaml"
+    fi
+}
+
+# Deploy monitoring stack for each team
+deploy_monitoring_stacks() {
+    log "Deploying monitoring stacks for each team..."
+    
+    # Deploy Sales monitoring stack
+    deploy_prometheus $SALES_NAMESPACE "sales"
+    deploy_grafana $SALES_NAMESPACE "sales"
+    
+    # Deploy Accounts monitoring stack  
+    deploy_prometheus $ACCOUNTS_NAMESPACE "accounts"
+    deploy_grafana $ACCOUNTS_NAMESPACE "accounts"
+    
+    success "All monitoring stacks deployed"
+}
+
+# Deploy DocumentDB instance for a team
+deploy_documentdb() {
+    local namespace=$1
+    local team=$2
+    local cluster_name="documentdb-$team"
+    
+    log "Deploying DocumentDB cluster for $team team in namespace: $namespace"
+    
+    # Create DocumentDB credentials secret (must be named 'documentdb-credentials')
+    cat > /tmp/documentdb-$team-secret.yaml <<EOF
+apiVersion: v1
+kind: Secret
+metadata:
+  name: documentdb-credentials
+  namespace: $namespace
+type: Opaque
+stringData:
+  username: $team
+  password: ${team^}Password123
+EOF
+    
+    # Create DocumentDB cluster manifest
+    cat > /tmp/documentdb-$team-cluster.yaml <<EOF
+apiVersion: db.microsoft.com/preview
+kind: DocumentDB
+metadata:
+  name: $cluster_name
+  namespace: $namespace
+  labels:
+    team: $team
+    tenant: $team
+    cnpg.io/cluster: $cluster_name
+spec:
+  nodeCount: 1
+  instancesPerNode: 1
+  resource:
+    storage:
+      pvcSize: 10Gi
+  exposeViaService:
+    serviceType: ClusterIP
+EOF
+    
+    # Apply the configurations
+    kubectl apply -f /tmp/documentdb-$team-secret.yaml
+    kubectl apply -f /tmp/documentdb-$team-cluster.yaml
+    
+    # Clean up temp files
+    rm -f /tmp/documentdb-$team-secret.yaml /tmp/documentdb-$team-cluster.yaml
+    
+    success "DocumentDB cluster deployed for $team team: $cluster_name"
+}
+
+# Deploy DocumentDB instances for all teams
+deploy_documentdb_instances() {
+    log "Deploying DocumentDB instances for each team..."
+    
+    # Deploy Sales DocumentDB
+    deploy_documentdb $SALES_NAMESPACE "sales"
+    
+    # Deploy Accounts DocumentDB
+    deploy_documentdb $ACCOUNTS_NAMESPACE "accounts"
+    
+    success "All DocumentDB instances deployed"
+}
+
+# Wait for collectors to be ready
+wait_for_collectors() {
+    log "Waiting for OpenTelemetry collectors to be ready..."
+    
+    # Wait for Sales collector
+    kubectl wait --for=condition=available deployment/documentdb-sales-collector-collector -n $SALES_NAMESPACE --timeout=300s
+    success "Sales collector is ready"
+    
+    # Wait for Accounts collector  
+    kubectl wait --for=condition=available deployment/documentdb-accounts-collector-collector -n $ACCOUNTS_NAMESPACE --timeout=300s
+    success "Accounts collector is ready"
+}
+
+# Wait for monitoring stacks to be ready
+wait_for_monitoring_stacks() {
+    log "Waiting for monitoring stacks to be ready..."
+    
+    # Wait for Sales monitoring stack
+    kubectl wait --for=condition=available deployment/prometheus-sales-server -n $SALES_NAMESPACE --timeout=300s
+    kubectl wait --for=condition=available deployment/grafana-sales -n $SALES_NAMESPACE --timeout=300s
+    success "Sales monitoring stack is ready"
+    
+    # Wait for Accounts monitoring stack
+    kubectl wait --for=condition=available deployment/prometheus-accounts-server -n $ACCOUNTS_NAMESPACE --timeout=300s 
+    kubectl wait --for=condition=available deployment/grafana-accounts -n $ACCOUNTS_NAMESPACE --timeout=300s
+    success "Accounts monitoring stack is ready"
+}
+
+# Wait for DocumentDB instances to be ready
+wait_for_documentdb_instances() {
+    log "Waiting for DocumentDB instances to be ready..."
+    
+    # Wait for Sales DocumentDB
+    log "Waiting for Sales DocumentDB cluster..."
+    kubectl wait --for=condition=ready documentdb/documentdb-sales -n $SALES_NAMESPACE --timeout=600s
+    success "Sales DocumentDB is ready"
+    
+    # Wait for Accounts DocumentDB
+    log "Waiting for Accounts DocumentDB cluster..."
+    kubectl wait --for=condition=ready documentdb/documentdb-accounts -n $ACCOUNTS_NAMESPACE --timeout=600s
+    success "Accounts DocumentDB is ready"
+}
+
+# Show deployment status
+show_status() {
+    log ""
+    log "🎯 Multi-Tenant Telemetry Deployment Status:"
+    log "============================================="
+    log ""
+    
+    log "📊 Sales Team (Namespace: $SALES_NAMESPACE):"
+    log "  DocumentDB Cluster:"
+    kubectl get documentdb -n $SALES_NAMESPACE || true
+    kubectl get pods -n $SALES_NAMESPACE -l cnpg.io/cluster=documentdb-sales || true
+    log "  OpenTelemetry Collector:"
+    kubectl get pods -n $SALES_NAMESPACE -l app.kubernetes.io/name=documentdb-sales-collector-collector || true
+    log "  Prometheus:"
+    kubectl get pods -n $SALES_NAMESPACE -l app.kubernetes.io/name=prometheus-server || true
+    log "  Grafana:"
+    kubectl get pods -n $SALES_NAMESPACE -l app.kubernetes.io/name=grafana || true
+    log ""
+    
+    log "📊 Accounts Team (Namespace: $ACCOUNTS_NAMESPACE):"
+    log "  DocumentDB Cluster:"
+    kubectl get documentdb -n $ACCOUNTS_NAMESPACE || true
+    kubectl get pods -n $ACCOUNTS_NAMESPACE -l cnpg.io/cluster=documentdb-accounts || true
+    log "  OpenTelemetry Collector:"
+    kubectl get pods -n $ACCOUNTS_NAMESPACE -l app.kubernetes.io/name=documentdb-accounts-collector-collector || true
+    log "  Prometheus:"
+    kubectl get pods -n $ACCOUNTS_NAMESPACE -l app.kubernetes.io/name=prometheus-server || true
+    log "  Grafana:"
+    kubectl get pods -n $ACCOUNTS_NAMESPACE -l app.kubernetes.io/name=grafana || true
+    log ""
+    
+    # Get Grafana admin credentials and URLs
+    log "� Grafana Access Information:"
+    log "  Sales Grafana:"
+    log "    URL: kubectl port-forward -n $SALES_NAMESPACE svc/grafana-sales 3001:80"
+    log "    Admin Password: admin123"
+    log "  Accounts Grafana:"
+    log "    URL: kubectl port-forward -n $ACCOUNTS_NAMESPACE svc/grafana-accounts 3002:80"
+    log "    Admin Password: admin123"
+    log ""
+    
+    log "🔗 DocumentDB Connection Strings:"
+    log "  Sales: kubectl get secret documentdb-credentials -n $SALES_NAMESPACE -o jsonpath='{.data.username}' | base64 -d"
+    log "  Accounts: kubectl get secret documentdb-credentials -n $ACCOUNTS_NAMESPACE -o jsonpath='{.data.username}' | base64 -d"
+    log ""
+    
+    log "�🔍 How to check metrics per team:"
+    log "  Sales metrics: kubectl logs -n $SALES_NAMESPACE -l app.kubernetes.io/name=documentdb-sales-collector-collector"
+    log "  Accounts metrics: kubectl logs -n $ACCOUNTS_NAMESPACE -l app.kubernetes.io/name=documentdb-accounts-collector-collector"
+    log ""
+    
+    log "📝 Prometheus URLs (internal):"
+    log "  Sales: http://prometheus-sales-server.$SALES_NAMESPACE.svc.cluster.local"
+    log "  Accounts: http://prometheus-accounts-server.$ACCOUNTS_NAMESPACE.svc.cluster.local"
+}
+
+# Main execution
+main() {
+    log "Starting Multi-Tenant DocumentDB + Telemetry Deployment..."
+    log "========================================================="
+    log "Configuration:"
+    log "  Deploy DocumentDB: $DEPLOY_DOCUMENTDB"
+    log "  Deploy Telemetry: $DEPLOY_TELEMETRY"
+    log "  Skip Wait: $SKIP_WAIT"
+    log ""
+    
+    check_prerequisites
+    create_namespaces
+    
+    if [[ "$DEPLOY_DOCUMENTDB" == "true" ]]; then
+        deploy_documentdb_instances
+    fi
+    
+    if [[ "$DEPLOY_TELEMETRY" == "true" ]]; then
+        deploy_collectors
+        deploy_monitoring_stacks
+    fi
+    
+    if [[ "$SKIP_WAIT" == "false" ]]; then
+        if [[ "$DEPLOY_DOCUMENTDB" == "true" ]]; then
+            wait_for_documentdb_instances
+        fi
+        
+        if [[ "$DEPLOY_TELEMETRY" == "true" ]]; then
+            wait_for_collectors
+            wait_for_monitoring_stacks
+        fi
+    fi
+    
+    show_status
+    
+    success "Multi-tenant deployment completed successfully!"
+    log ""
+    if [[ "$DEPLOY_DOCUMENTDB" == "true" && "$DEPLOY_TELEMETRY" == "true" ]]; then
+        log "💡 What was deployed:"
+        log "  ✅ DocumentDB clusters with proper cluster labels"
+        log "  ✅ OpenTelemetry collectors for auto-discovery"
+        log "  ✅ Prometheus instances for metrics storage"
+        log "  ✅ Grafana dashboards for visualization"
+        log ""
+        log "🚀 Ready to use:"
+        log "  - Sales team has complete isolated stack in sales-namespace"
+        log "  - Accounts team has complete isolated stack in accounts-namespace"
+        log "  - Metrics are automatically collected and displayed"
+    elif [[ "$DEPLOY_TELEMETRY" == "true" ]]; then
+        log "💡 Telemetry stack deployed - ready for DocumentDB instances"
+    elif [[ "$DEPLOY_DOCUMENTDB" == "true" ]]; then
+        log "💡 DocumentDB clusters deployed - add telemetry with --telemetry-only"
+    fi
+}
+
+# Run main function
+main "$@"
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/scripts/setup-grafana-dashboards.sh b/documentdb-playground/telemetry/scripts/setup-grafana-dashboards.sh
new file mode 100755
index 00000000..560ad4a4
--- /dev/null
+++ b/documentdb-playground/telemetry/scripts/setup-grafana-dashboards.sh
@@ -0,0 +1,197 @@
+#!/bin/bash
+
+# Script to automatically create Grafana dashboards for CPU and memory monitoring
+# Usage: ./setup-grafana-dashboards.sh [namespace]
+
+set -e
+
+NAMESPACE=${1:-"sales-namespace"}
+TEAM=$(echo $NAMESPACE | cut -d'-' -f1)
+
+echo "Setting up Grafana dashboard for $TEAM team in $NAMESPACE..."
+
+# Dashboard JSON configuration
+DASHBOARD_JSON=$(cat <<'EOF'
+{
+  "dashboard": {
+    "id": null,
+    "title": "TEAM_NAME Workload Monitoring",
+    "tags": ["TEAM_NAME", "kubernetes", "monitoring"],
+    "timezone": "browser",
+    "panels": [
+      {
+        "id": 1,
+        "title": "CPU Usage by Container",
+        "type": "timeseries",
+        "targets": [
+          {
+            "expr": "rate(container_cpu_usage_seconds_total{namespace=\"NAMESPACE_NAME\",container!=\"POD\",container!=\"\"}[5m]) * 100",
+            "legendFormat": "{{container}} - {{pod}}",
+            "refId": "A"
+          }
+        ],
+        "fieldConfig": {
+          "defaults": {
+            "unit": "percent",
+            "min": 0,
+            "max": 100
+          }
+        },
+        "gridPos": {
+          "h": 8,
+          "w": 12,
+          "x": 0,
+          "y": 0
+        }
+      },
+      {
+        "id": 2,
+        "title": "Memory Usage by Container",
+        "type": "timeseries",
+        "targets": [
+          {
+            "expr": "container_memory_working_set_bytes{namespace=\"NAMESPACE_NAME\",container!=\"POD\",container!=\"\"} / 1024 / 1024",
+            "legendFormat": "{{container}} - {{pod}}",
+            "refId": "A"
+          }
+        ],
+        "fieldConfig": {
+          "defaults": {
+            "unit": "decmbytes",
+            "min": 0
+          }
+        },
+        "gridPos": {
+          "h": 8,
+          "w": 12,
+          "x": 12,
+          "y": 0
+        }
+      },
+      {
+        "id": 3,
+        "title": "Memory Usage Percentage",
+        "type": "timeseries",
+        "targets": [
+          {
+            "expr": "(container_memory_working_set_bytes{namespace=\"NAMESPACE_NAME\",container!=\"POD\",container!=\"\"} / container_spec_memory_limit_bytes{namespace=\"NAMESPACE_NAME\",container!=\"POD\",container!=\"\"}) * 100",
+            "legendFormat": "{{container}} - {{pod}}",
+            "refId": "A"
+          }
+        ],
+        "fieldConfig": {
+          "defaults": {
+            "unit": "percent",
+            "min": 0,
+            "max": 100
+          }
+        },
+        "gridPos": {
+          "h": 8,
+          "w": 12,
+          "x": 0,
+          "y": 8
+        }
+      },
+      {
+        "id": 4,
+        "title": "Pod Count",
+        "type": "stat",
+        "targets": [
+          {
+            "expr": "count(container_cpu_usage_seconds_total{namespace=\"NAMESPACE_NAME\",container!=\"POD\",container!=\"\"})",
+            "legendFormat": "Running Pods",
+            "refId": "A"
+          }
+        ],
+        "fieldConfig": {
+          "defaults": {
+            "unit": "short"
+          }
+        },
+        "gridPos": {
+          "h": 8,
+          "w": 12,
+          "x": 12,
+          "y": 8
+        }
+      }
+    ],
+    "time": {
+      "from": "now-1h",
+      "to": "now"
+    },
+    "refresh": "30s"
+  },
+  "overwrite": true
+}
+EOF
+)
+
+# Replace placeholders in the JSON
+DASHBOARD_JSON=$(echo "$DASHBOARD_JSON" | sed "s/TEAM_NAME/$TEAM/g" | sed "s/NAMESPACE_NAME/$NAMESPACE/g")
+
+# Function to create dashboard via API
+create_dashboard() {
+    local grafana_url="http://localhost:${1}"
+    local auth="admin:admin123"
+    
+    echo "Creating dashboard for $TEAM team at $grafana_url..."
+    
+    # Test connection first
+    if ! curl -s --fail -u "$auth" "$grafana_url/api/health" > /dev/null; then
+        echo "Error: Cannot connect to Grafana at $grafana_url"
+        echo "Make sure port-forward is running: kubectl port-forward -n $NAMESPACE svc/grafana-$TEAM ${1}:3000"
+        return 1
+    fi
+    
+    # Create the dashboard
+    response=$(curl -s -X POST \
+        -H "Content-Type: application/json" \
+        -u "$auth" \
+        -d "$DASHBOARD_JSON" \
+        "$grafana_url/api/dashboards/db")
+    
+    if echo "$response" | grep -q '"status":"success"'; then
+        dashboard_url=$(echo "$response" | jq -r '.url')
+        echo "✅ Dashboard created successfully!"
+        echo "🔗 Access it at: $grafana_url$dashboard_url"
+    else
+        echo "❌ Error creating dashboard:"
+        echo "$response" | jq '.'
+        return 1
+    fi
+}
+
+# Create dashboards based on namespace
+case $NAMESPACE in
+    "sales-namespace")
+        echo "Setting up Sales team dashboard..."
+        create_dashboard 3001
+        ;;
+    "accounts-namespace")
+        echo "Setting up Accounts team dashboard..."
+        create_dashboard 3002
+        ;;
+    *)
+        echo "Unknown namespace: $NAMESPACE"
+        echo "Supported namespaces: sales-namespace, accounts-namespace"
+        exit 1
+        ;;
+esac
+
+echo ""
+echo "Dashboard setup complete! 🎉"
+echo ""
+echo "To view your dashboard:"
+echo "1. Open your browser to the URL shown above"
+echo "2. Login with username: admin, password: admin123"
+echo "3. The dashboard should be available in your dashboards list"
+echo ""
+echo "The dashboard includes:"
+echo "- CPU Usage by Container"
+echo "- Memory Usage by Container"
+echo "- Memory Usage Percentage"
+echo "- Pod Count"
+echo ""
+echo "All metrics are filtered to show only workloads in the $NAMESPACE namespace."
\ No newline at end of file
diff --git a/documentdb-playground/telemetry/telemetry-design.md b/documentdb-playground/telemetry/telemetry-design.md
new file mode 100644
index 00000000..567f3611
--- /dev/null
+++ b/documentdb-playground/telemetry/telemetry-design.md
@@ -0,0 +1,637 @@
+# DocumentDB Telemetry Architecture Design
+
+## Overview
+
+This document outlines the telemetry architecture for collecting CPU and memory metrics from DocumentDB instances running on Kubernetes and visualizing them through Grafana dashboards.
+
+## Current DocumentDB Architecture
+
+### Pod Structure
+Each DocumentDB instance consists of:
+- **1 Pod per instancePerNode** (currently limited to 1)
+- **2 Containers per Pod**:
+  1. **PostgreSQL Container**: The main DocumentDB engine (based on PostgreSQL with DocumentDB extensions)
+  2. **Gateway Container**: DocumentDB gateway sidecar for MongoDB API compatibility
+
+### Deployment Flow
+1. **Cluster Preparation**: Install dependencies (CloudNative-PG operator, storage classes, etc.)
+2. **Operator Installation**: Deploy DocumentDB operator
+3. **Instance Deployment**: Create DocumentDB custom resources
+
+## Proposed Telemetry Architecture
+
+### Architecture Decision: DaemonSet vs Sidecar
+
+**RECOMMENDED: DaemonSet Approach (One Collector Per Node)**
+
+For DocumentDB monitoring, we recommend **one OpenTelemetry Collector per node** (DaemonSet) rather than sidecar injection:
+
+#### **Why DaemonSet is Better for DocumentDB:**
+
+| Factor | DaemonSet (✅ Recommended) | Sidecar |
+|--------|---------------------------|---------|
+| **Resource Usage** | 50MB RAM per node | 50MB RAM per DocumentDB pod |
+| **Node Metrics** | ✅ Full node visibility | ❌ No node-level metrics |
+| **Scalability** | Linear with nodes | Linear with pods |
+| **Management** | Simple (3-5 collectors) | Complex (10+ collectors) |
+| **DocumentDB Context** | Perfect for current 1-pod-per-node | Overkill for current setup |
+
+#### **Resource Comparison Example:**
+```yaml
+# Scenario: 9 DocumentDB pods across 3 nodes (3 pods per node)
+# instancesPerNode: 3 (maximum supported)
+
+# DaemonSet: 3 collectors total (1 per node)
+Total Resources: 150MB RAM, 150m CPU
+
+# Sidecar: 9 collectors (1 per DocumentDB pod)  
+Total Resources: 450MB RAM, 450m CPU
+
+# DaemonSet saves: 67% resources
+```
+
+#### **When to Consider Sidecar:**
+- High-cardinality custom application metrics
+- Pod-specific configuration requirements  
+- Multi-tenant isolation needs
+- Different metric collection intervals per pod
+
+#### **For DocumentDB Use Case:**
+- ✅ **Infrastructure monitoring focus** (CPU, memory, I/O)
+- ✅ **Node-level context important** (node resources affect DocumentDB performance)
+- ✅ **Current architecture**: 1 pod per node, future support for up to 3 pods per node
+- ✅ **Resource efficiency** critical for production deployments
+
+### Architecture Components
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Grafana Dashboard                        │
+│                     (Visualization Layer)                      │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│                      Prometheus                                 │
+│                   (Metrics Storage)                             │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│              OpenTelemetry Collector (DaemonSet)                │
+│                   (Unified Metrics Collection)                  │
+│  ┌─────────────────────────────────────────────────────────────┐│
+│  │ Receivers:                                                  ││
+│  │ • kubeletstats (cAdvisor + Node metrics)                   ││
+│  │ • k8s_cluster (Kube State Metrics)                         ││
+│  │ • prometheus (scraping endpoints)                          ││
+│  │ • filelog (container logs)                                 ││
+│  └─────────────────────────────────────────────────────────────┘│
+│  ┌─────────────────────────────────────────────────────────────┐│
+│  │ Processors:                                                 ││
+│  │ • resource detection                                        ││
+│  │ • attribute enhancement                                     ││
+│  │ • metric filtering                                          ││
+│  └─────────────────────────────────────────────────────────────┘│
+│  ┌─────────────────────────────────────────────────────────────┐│
+│  │ Exporters:                                                  ││
+│  │ • prometheusremotewrite                                     ││
+│  └─────────────────────────────────────────────────────────────┘│
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│                 Kubernetes Cluster                              │
+│  ┌─────────────────────────────────────────────────────────────┐│
+│  │                DocumentDB Pods                              ││
+│  │  ┌─────────────────┐  ┌─────────────────┐                  ││
+│  │  │ PostgreSQL      │  │ Gateway         │                  ││
+│  │  │ Container       │  │ Container       │                  ││
+│  │  │ (DocumentDB)    │  │ (MongoDB API)   │                  ││
+│  │  └─────────────────┘  └─────────────────┘                  ││
+│  └─────────────────────────────────────────────────────────────┘│
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 1. Metrics Collection Layer (OpenTelemetry Collector)
+
+The OpenTelemetry Collector runs as a DaemonSet on each node and provides unified collection of all metrics through various receivers:
+
+#### A. Kubelet Stats Receiver (Replaces cAdvisor + Node Exporter)
+- **Source**: Kubelet's built-in metrics API
+- **Container Metrics Collected**:
+  - CPU usage (cores, percentage)
+  - Memory usage (RSS, cache, swap)
+  - Memory limits and requests  
+  - CPU limits and requests
+  - Network I/O
+  - Filesystem I/O
+- **Node Metrics Collected**:
+  - Node CPU utilization
+  - Node memory utilization
+  - Node filesystem usage
+  - Node network statistics
+
+#### B. Kubernetes Cluster Receiver (Replaces Kube State Metrics)
+- **Source**: Kubernetes API server
+- **Metrics Collected**:
+  - Pod status and phases
+  - Container restart counts
+  - Resource requests and limits
+  - DocumentDB custom resource status
+  - Node status and conditions
+
+#### C. Prometheus Receiver (For Application Metrics)
+- **Source**: Application metrics endpoints from DocumentDB containers
+- **Use Case**: Custom DocumentDB application metrics
+- **Future Enhancement**: Gateway container request metrics (Read/Write operations)
+
+#### D. OTLP Receiver (Optional Future Enhancement)
+- **Source**: Direct OpenTelemetry instrumentation from applications
+- **Use Case**: High-performance metrics collection from DocumentDB Gateway
+- **Protocol**: Native OpenTelemetry Protocol (OTLP)
+
+#### OpenTelemetry Collector Configuration
+```yaml
+receivers:
+  kubeletstats:
+    collection_interval: 20s
+    auth_type: "serviceAccount"
+    endpoint: "https://${env:K8S_NODE_NAME}:10250"
+    insecure_skip_verify: true
+    metric_groups:
+      - container
+      - pod
+      - node
+      - volume
+    metrics:
+      k8s.container.cpu_limit:
+        enabled: true
+      k8s.container.cpu_request:
+        enabled: true
+      k8s.container.memory_limit:
+        enabled: true
+      k8s.container.memory_request:
+        enabled: true
+
+  k8s_cluster:
+    auth_type: serviceAccount
+    node: ${env:K8S_NODE_NAME}
+    metadata_exporters: [prometheus]
+
+  # Application metrics from DocumentDB Gateway containers
+  prometheus/gateway:
+    config:
+      scrape_configs:
+        - job_name: 'documentdb-gateway'
+          kubernetes_sd_configs:
+            - role: pod
+          relabel_configs:
+            - source_labels: [__meta_kubernetes_pod_label_app]
+              regex: 'documentdb.*'
+              action: keep
+            - source_labels: [__meta_kubernetes_pod_container_name]
+              regex: 'gateway'
+              action: keep
+            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
+              action: keep
+              regex: true
+            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
+              action: replace
+              target_label: __metrics_path__
+              regex: (.+)
+            - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
+              action: replace
+              regex: ([^:]+)(?::\d+)?;(\d+)
+              replacement: $1:$2
+              target_label: __address__
+
+  # Future: Native OTLP for high-performance metrics
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  resourcedetection:
+    detectors: [env, k8snode, kubernetes]
+    timeout: 2s
+    override: false
+
+  attributes/documentdb:
+    actions:
+      - key: documentdb.instance
+        from_attribute: k8s.pod.label.app
+        action: insert
+      - key: documentdb.component
+        from_attribute: k8s.container.name
+        action: insert
+      - key: documentdb.operation_type
+        from_attribute: operation
+        action: insert
+
+  filter/documentdb:
+    metrics:
+      include:
+        match_type: regexp
+        resource_attributes:
+          - key: k8s.pod.label.app
+            value: "documentdb.*"
+
+exporters:
+  prometheusremotewrite:
+    endpoint: "http://prometheus:9090/api/v1/write"
+    tls:
+      insecure: true
+
+service:
+  pipelines:
+    metrics:
+      receivers: [kubeletstats, k8s_cluster, prometheus/gateway, otlp]
+      processors: [resourcedetection, attributes/documentdb, filter/documentdb]
+      exporters: [prometheusremotewrite]
+```
+
+### 2. Metrics Storage Layer
+
+#### Prometheus Configuration (Simplified)
+Since OpenTelemetry Collector handles all metric collection and forwarding, Prometheus configuration is simplified:
+
+```yaml
+# Prometheus receives metrics via remote write from OpenTelemetry Collector
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
+
+# OpenTelemetry Collector pushes metrics here
+remote_write_configs: []  # Not needed as OTel pushes via API
+
+# Optional: Direct scraping of Prometheus metrics from OTel Collector itself
+scrape_configs:
+  - job_name: 'otel-collector'
+    static_configs:
+      - targets: ['otel-collector:8888']  # OTel Collector's own metrics
+```
+
+### 3. Visualization Layer
+
+#### Grafana Dashboard Structure
+
+##### Panel 1: DocumentDB Instance Overview
+- **Metrics**:
+  - Total number of DocumentDB instances
+  - Instance health status
+  - Pod restarts in last 24h
+
+##### Panel 2: CPU Metrics
+- **PostgreSQL Container CPU**:
+  - `rate(k8s_container_cpu_time{k8s_container_name="postgres",k8s_pod_label_app=~"documentdb.*"}[5m])`
+- **Gateway Container CPU**:
+  - `rate(k8s_container_cpu_time{k8s_container_name="gateway",k8s_pod_label_app=~"documentdb.*"}[5m])`
+- **CPU Utilization vs Limits**:
+  - `(rate(k8s_container_cpu_time[5m]) / k8s_container_cpu_limit) * 100`
+
+##### Panel 3: Memory Metrics
+- **PostgreSQL Container Memory**:
+  - `k8s_container_memory_usage{k8s_container_name="postgres",k8s_pod_label_app=~"documentdb.*"}`
+- **Gateway Container Memory**:
+  - `k8s_container_memory_usage{k8s_container_name="gateway",k8s_pod_label_app=~"documentdb.*"}`
+- **Memory Utilization vs Limits**:
+  - `(k8s_container_memory_usage / k8s_container_memory_limit) * 100`
+
+##### Panel 4: Gateway Application Metrics (Future Enhancement)
+- **Read Operations per Second**:
+  - `rate(documentdb_gateway_read_operations_total[5m])`
+- **Write Operations per Second**:
+  - `rate(documentdb_gateway_write_operations_total[5m])`
+- **Operation Latency**:
+  - `histogram_quantile(0.95, rate(documentdb_gateway_operation_duration_seconds_bucket[5m]))`
+- **Error Rate**:
+  - `rate(documentdb_gateway_errors_total[5m]) / rate(documentdb_gateway_operations_total[5m]) * 100`
+
+##### Panel 5: Resource Efficiency
+- **CPU Requests vs Usage**
+- **Memory Requests vs Usage**
+- **Resource waste indicators**
+
+## Application Metrics Integration (Future Enhancement)
+
+### Gateway Container Metrics
+
+When the DocumentDB Gateway container starts emitting application metrics, the DaemonSet architecture seamlessly supports this through multiple collection methods:
+
+#### Method 1: Prometheus Metrics Endpoint (Recommended)
+```yaml
+# Gateway container exposes metrics on /metrics endpoint
+apiVersion: v1
+kind: Pod
+metadata:
+  annotations:
+    prometheus.io/scrape: "true"
+    prometheus.io/port: "8080"
+    prometheus.io/path: "/metrics"
+spec:
+  containers:
+  - name: gateway
+    image: ghcr.io/microsoft/documentdb/documentdb-local:16
+    ports:
+    - containerPort: 8080
+      name: metrics
+```
+
+#### Method 2: OTLP Direct Push (High Performance)
+```yaml
+# Gateway pushes metrics directly to OTel Collector
+# No scraping needed, lower latency, higher throughput
+environment:
+  - name: OTEL_EXPORTER_OTLP_ENDPOINT
+    value: "http://localhost:4317"  # OTel Collector on same node
+  - name: OTEL_SERVICE_NAME
+    value: "documentdb-gateway"
+```
+
+### Expected Gateway Metrics
+
+#### Request Metrics
+- `documentdb_gateway_requests_total{method, status}` - Total API requests
+- `documentdb_gateway_request_duration_seconds` - Request latency histogram
+- `documentdb_gateway_active_connections` - Current active connections
+
+#### Operation Metrics  
+- `documentdb_gateway_read_operations_total{database, collection}` - Read operations
+- `documentdb_gateway_write_operations_total{database, collection}` - Write operations
+- `documentdb_gateway_delete_operations_total{database, collection}` - Delete operations
+- `documentdb_gateway_query_operations_total{database, collection}` - Query operations
+
+#### Performance Metrics
+- `documentdb_gateway_operation_duration_seconds{operation_type}` - Operation latency
+- `documentdb_gateway_cache_hits_total` - Cache hit rate
+- `documentdb_gateway_cache_misses_total` - Cache miss rate
+- `documentdb_gateway_connection_pool_size` - Connection pool metrics
+
+#### Error Metrics
+- `documentdb_gateway_errors_total{error_type, operation}` - Error counts
+- `documentdb_gateway_timeouts_total{operation}` - Timeout counts
+- `documentdb_gateway_retries_total{operation}` - Retry attempts
+
+### DaemonSet Advantages for Application Metrics
+
+#### ✅ **Perfect Compatibility**
+- **Prometheus scraping**: OTel Collector autodiscovers Gateway pods
+- **OTLP push**: Gateway can push directly to collector on same node
+- **Service discovery**: Automatic discovery of new DocumentDB instances
+- **Label propagation**: Kubernetes labels automatically added to metrics
+
+#### ✅ **Network Efficiency**
+- **Local collection**: Metrics collected on same node (low latency)
+- **Reduced hops**: No cross-node network traffic for metrics
+- **Batch processing**: Efficient batching before sending to Prometheus
+
+#### ✅ **Operational Benefits**
+- **Single configuration**: Same collector handles infra + app metrics
+- **Unified pipeline**: Infrastructure and application metrics in same flow
+- **Consistent labeling**: Same resource detection and attribute processing
+- **Simplified debugging**: One place to troubleshoot metrics collection
+
+### Updated Architecture with Application Metrics
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Grafana Dashboard                        │
+│           Infrastructure + Application Metrics                  │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│                      Prometheus                                 │
+│                   (Unified Storage)                             │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│              OpenTelemetry Collector (DaemonSet)                │
+│                   (Unified Collection Agent)                    │
+│  ┌─────────────────────────────────────────────────────────────┐│
+│  │ Receivers:                                                  ││
+│  │ • kubeletstats (Infrastructure metrics)                    ││
+│  │ • k8s_cluster (Kubernetes metrics)                         ││
+│  │ • prometheus (Gateway /metrics scraping)                   ││
+│  │ • otlp (Gateway direct push) ← NEW                         ││
+│  └─────────────────────────────────────────────────────────────┘│
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────┴───────────────────────────────────────┐
+│                DocumentDB Pods                                  │
+│  ┌─────────────────┐  ┌─────────────────┐                      │
+│  │ PostgreSQL      │  │ Gateway         │                      │
+│  │ Container       │  │ Container       │                      │
+│  │                 │  │ • /metrics ← NEW│                      │
+│  │                 │  │ • OTLP push ← NEW│                     │
+│  └─────────────────┘  └─────────────────┘                      │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Implementation Plan
+
+### Phase 1: OpenTelemetry Collector Setup
+1. **Deploy OpenTelemetry Operator**
+   ```bash
+   # Install OpenTelemetry Operator
+   kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
+   ```
+
+2. **Deploy OpenTelemetry Collector as DaemonSet**
+   ```yaml
+   apiVersion: opentelemetry.io/v1alpha1
+   kind: OpenTelemetryCollector
+   metadata:
+     name: documentdb-metrics-collector
+     namespace: documentdb-telemetry
+   spec:
+     mode: daemonset
+     serviceAccount: otel-collector
+     config: |
+       # [OpenTelemetry configuration from above]
+   ```
+
+3. **Deploy Prometheus (Simplified)**
+   ```bash
+   # Deploy Prometheus without Node Exporter or Kube State Metrics
+   helm install prometheus prometheus-community/prometheus \
+     --namespace monitoring \
+     --create-namespace \
+     --set nodeExporter.enabled=false \
+     --set kubeStateMetrics.enabled=false \
+     --set server.persistentVolume.enabled=true
+   ```
+
+### Phase 2: DocumentDB Application Metrics Integration
+1. **Gateway Container Enhancement**
+   - Add metrics endpoint (`/metrics` on port 8080)
+   - Implement OpenTelemetry instrumentation
+   - Add prometheus annotations to pods
+
+2. **Collector Configuration Update**
+   ```yaml
+   # Add to existing OTel Collector config
+   receivers:
+     prometheus/gateway:
+       config:
+         scrape_configs:
+           - job_name: 'documentdb-gateway'
+             kubernetes_sd_configs:
+               - role: pod
+   ```
+
+3. **Enhanced Dashboards**
+   - Add application metrics panels
+   - Create alerts for operation errors
+   - Add capacity planning metrics
+
+### Phase 3: Advanced Application Monitoring
+1. **Create DocumentDB-specific Grafana dashboard**
+2. **Implement custom metrics for DocumentDB operations**
+3. **Add capacity planning metrics**
+
+## Configuration Examples
+
+### DocumentDB Pod Labels for Monitoring
+The DocumentDB operator should add these labels to pods for proper metric collection:
+
+```yaml
+metadata:
+  labels:
+    app.kubernetes.io/name: documentdb
+    app.kubernetes.io/instance: "{{ .Values.documentdb.name }}"
+    app.kubernetes.io/component: database
+    documentdb.microsoft.com/instance: "{{ .Values.documentdb.name }}"
+```
+
+### Prometheus Recording Rules (Updated for OpenTelemetry metrics)
+```yaml
+groups:
+  - name: documentdb.rules
+    rules:
+    - record: documentdb:cpu_usage_rate
+      expr: rate(k8s_container_cpu_time{k8s_container_name=~"postgres|gateway",k8s_pod_label_app=~"documentdb.*"}[5m])
+    
+    - record: documentdb:memory_usage_bytes
+      expr: k8s_container_memory_usage{k8s_container_name=~"postgres|gateway",k8s_pod_label_app=~"documentdb.*"}
+    
+    - record: documentdb:cpu_utilization_percent
+      expr: (documentdb:cpu_usage_rate / k8s_container_cpu_limit) * 100
+```
+
+### Alert Rules (Updated for OpenTelemetry metrics)
+```yaml
+groups:
+  - name: documentdb.alerts
+    rules:
+    - alert: DocumentDBHighCPUUsage
+      expr: documentdb:cpu_utilization_percent > 80
+      for: 5m
+      labels:
+        severity: warning
+      annotations:
+        summary: "DocumentDB instance {{ $labels.k8s_pod_name }} has high CPU usage"
+        description: "CPU usage is above 80% for 5 minutes"
+    
+    - alert: DocumentDBHighMemoryUsage
+      expr: (documentdb:memory_usage_bytes / k8s_container_memory_limit) * 100 > 85
+      for: 5m
+      labels:
+        severity: warning
+      annotations:
+        summary: "DocumentDB instance {{ $labels.k8s_pod_name }} has high memory usage"
+        description: "Memory usage is above 85% for 5 minutes"
+```
+
+## Deployment Instructions
+
+### 1. Deploy OpenTelemetry Monitoring Stack
+```bash
+# Create telemetry namespace
+kubectl create namespace documentdb-telemetry
+
+# Deploy OpenTelemetry Operator
+kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
+
+# Deploy OpenTelemetry Collector
+kubectl apply -f documentdb-playground/telemetry/otel-collector.yaml
+
+# Deploy Prometheus (simplified without Node Exporter)
+helm install prometheus prometheus-community/prometheus \
+  --namespace documentdb-telemetry \
+  --set nodeExporter.enabled=false \
+  --set kubeStateMetrics.enabled=false
+
+# Deploy Grafana
+helm install grafana grafana/grafana \
+  --namespace documentdb-telemetry
+```
+
+### 2. Configure DocumentDB for Monitoring
+Update the DocumentDB operator to include monitoring labels and annotations in the CNPG cluster specification.
+
+### 3. Import Grafana Dashboard
+Import the pre-built DocumentDB dashboard JSON into Grafana for immediate visualization.
+
+## Security Considerations
+
+1. **RBAC**: Ensure OpenTelemetry Collector has minimal required permissions for Kubelet API access
+2. **Network Policies**: Restrict access to metrics endpoints and collector APIs
+3. **Data Retention**: Configure appropriate retention policies for metrics in Prometheus
+4. **Authentication**: Secure Grafana with proper authentication
+5. **Service Account**: Use dedicated service account for OpenTelemetry Collector with appropriate cluster roles
+
+## OpenTelemetry RBAC Configuration
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: otel-collector
+  namespace: documentdb-telemetry
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: otel-collector
+rules:
+- apiGroups: [""]
+  resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
+  verbs: ["get", "list", "watch"]
+- apiGroups: ["apps"]
+  resources: ["daemonsets", "deployments", "replicasets"]
+  verbs: ["get", "list", "watch"]
+- apiGroups: ["db.microsoft.com"]
+  resources: ["documentdbs"]
+  verbs: ["get", "list", "watch"]
+- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
+  verbs: ["get"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: otel-collector
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: otel-collector
+subjects:
+- kind: ServiceAccount
+  name: otel-collector
+  namespace: documentdb-telemetry
+```
+
+## Monitoring Best Practices
+
+1. **Label Consistency**: Use consistent labeling across all DocumentDB resources
+2. **Metric Cardinality**: Avoid high-cardinality labels that could impact Prometheus performance
+3. **Alert Thresholds**: Set realistic thresholds based on workload patterns
+4. **Dashboard Organization**: Group related metrics and use consistent color schemes
+5. **Performance Impact**: Monitor the monitoring stack's own resource usage
+
+## Future Enhancements
+
+1. **Custom DocumentDB Metrics**: Implement DocumentDB-specific application metrics
+2. **Distributed Tracing**: Add OpenTelemetry for request tracing
+3. **Log Aggregation**: Integrate with ELK stack for log analysis
+4. **Capacity Planning**: Implement predictive analytics for resource planning
+5. **Multi-Cloud Support**: Extend monitoring to work across different cloud providers
\ No newline at end of file