Digitalis.IO Monitoring Stack

A complete, production-ready monitoring solution by Digitalis.IO

Deploy Grafana, Prometheus, Loki, Tempo, Mimir, and Alertmanager to AWS, Hetzner Cloud, or Exoscale with just a few commands.

What's Inside

This monitoring stack provides:

Grafana: Beautiful, interactive dashboards for visualizing all your metrics, logs, and traces
Prometheus: Powerful time-series database for collecting and storing metrics
Loki: Efficient log aggregation and querying system inspired by Prometheus
Tempo: Distributed tracing backend for tracking requests across your services
Mimir: Horizontally scalable, long-term storage for Prometheus metrics
Alertmanager: Intelligent alert routing and notification management

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                         Grafana                             │
│              (Visualization & Dashboards)                   │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
┌───────▼────────┐   ┌────────▼────────┐   ┌───────▼────────┐
│   Prometheus   │   │      Loki       │   │     Tempo      │
│   (Metrics)    │   │     (Logs)      │   │    (Traces)    │
└────────────────┘   └─────────────────┘   └────────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                    ┌─────────▼──────────┐
                    │  Long-term Storage │
                    │  (S3/Object Store) │
                    └────────────────────┘

Component Roles

Grafana: Your central dashboard hub - query, visualize, and alert on all your data
Prometheus: Scrapes metrics from your applications and infrastructure every few seconds
Loki: Stores logs efficiently and lets you query them using LogQL (like Prometheus for logs)
Tempo: Captures distributed traces to show you exactly how requests flow through your system
Mimir: Provides virtually unlimited retention for your Prometheus metrics via object storage
Alertmanager: Deduplicates, groups, and routes alerts to the right notification channels

Quick Start

Choose your deployment method based on your cloud provider and preferred tools:

AWS Deployments

🚀 CloudFormation (Recommended for AWS)

The easiest way to get started on AWS - no DevOps experience required!

Deploy in one command:

cd cloudformation
./deploy-stack-simple.sh

Or click to launch directly in AWS Console:

Features:

Auto-detects your default VPC and subnet
Creates EC2 instances with all monitoring services
Optional S3 buckets for long-term storage
Simple web-based configuration wizard
Complete in 5-10 minutes

📖 Full CloudFormation Guide - Step-by-step instructions, troubleshooting, cost estimates, and more

⚙️ Terraform for AWS

For infrastructure-as-code enthusiasts or existing Terraform users:

cd terraform/aws
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your VPC and subnet IDs
terraform init
terraform apply

Features:

Full control over all configuration options
Auto-scaling groups for high availability
Network load balancer support
CloudWatch integration for Grafana
Comprehensive IAM role management

📖 Terraform AWS Guide - Configuration examples, variables reference, and advanced setups

Hetzner Cloud

Deploy to European cloud infrastructure with excellent price/performance:

cd terraform/hetzner
export HCLOUD_TOKEN="your-hetzner-token"
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your preferences
terraform init
terraform apply

Features:

Native Hetzner Object Storage integration
Private networking between instances
Placement groups for high availability
Auto-generated SSH keys
Cost-effective pricing (~€8-50/month)

📖 Terraform Hetzner Guide - Server types, locations, object storage setup

Exoscale

Deploy on Swiss infrastructure with a focus on security and privacy:

cd terraform/exoscale
export EXOSCALE_API_KEY="your-api-key"
export EXOSCALE_API_SECRET="your-secret-key"
terraform init
terraform apply

📖 Terraform Exoscale Guide - Zone selection, instance types, and configuration

Detailed Documentation

Deployment Guides

Guide	Description	Best For
CloudFormation Deployment	AWS-native infrastructure deployment with step-by-step instructions	AWS users, beginners, quick deployments
Terraform - General Guide	Multi-cloud Terraform deployment covering AWS, Hetzner, and Exoscale	Infrastructure-as-code users, multi-cloud deployments
Terraform AWS Module	Technical reference for integrating the AWS module into your Terraform projects	Advanced Terraform users, module integration
Terraform Hetzner Module	Technical reference for the Hetzner Cloud Terraform module	Hetzner users, cost-conscious deployments

Post-Deployment

After your infrastructure is deployed, you need to configure the monitoring stack:

Option 1: Configuration Wizard (Recommended)

Find your server IP from the deployment outputs
Open your browser to https://YOUR-SERVER-IP:9443
Follow the wizard to configure all services
Access Grafana at https://YOUR-SERVER-IP

Option 2: Manual Configuration

If you prefer command-line configuration, you can use Ansible or Docker Compose to deploy the monitoring services to your infrastructure.

Service Endpoints

After deployment, services are accessible on these ports:

Service	Port	URL Example
Configuration Wizard	9443	`https://your-ip:9443`
Grafana	443	`https://your-ip`
Prometheus	9090	`http://your-ip:9090`
Loki	3100	`http://your-ip:3100`
Alertmanager	9093	`http://your-ip:9093`
Mimir	9009	`http://your-ip:9009`
Tempo	3200	`http://your-ip:3200`
OpenTelemetry	4317/4318	`http://your-ip:4317` (gRPC) / `4318` (HTTP)

Examples

Example 1: Quick Test Environment (AWS CloudFormation)

Deploy a minimal stack for testing in under 5 minutes:

cd cloudformation
INSTANCE_TYPE=t3.small STACK_NAME=test-monitoring ./deploy-stack-simple.sh

Result: Single t3.small instance (~$15/month) with all monitoring services

Example 2: Production Setup (AWS Terraform)

High-availability deployment with auto-scaling:

module "monitoring" {
  source = "git@bitbucket.org:digitalisio/ap-monitoring-stack.git//terraform/aws"

  region     = "us-east-1"
  vpc_id     = "vpc-123456"
  subnet_ids = ["subnet-abc", "subnet-def", "subnet-xyz"]

  # Auto-scaling: 2-6 instances based on load
  enable_auto_scaling = true
  min_size           = 2
  max_size           = 6
  desired_capacity   = 3
  instance_type      = "t3.large"

  # Load balancer for HA
  enable_load_balancer = true

  # S3 storage for long-term retention
  enable_mimir_bucket  = true
  enable_loki_bucket   = true
  enable_tempo_bucket  = true
  enable_backup_bucket = true

  # CloudWatch integration
  enable_cloudwatch_datasource = true

  # Restrict access to office network
  allowed_external_cidrs = ["203.0.113.0/24"]

  tags = {
    Environment = "Production"
    Team        = "DevOps"
  }
}

Result: Enterprise-ready monitoring with HA, auto-scaling, and CloudWatch dashboards

Example 3: Cost-Optimized Hetzner Deployment

Deploy on Hetzner Cloud for excellent price/performance:

cd terraform/hetzner
export HCLOUD_TOKEN="your-token"

cat > terraform.tfvars <<EOF
project_name = "monitoring"
location     = "hel1"
server_type  = "cx32"
instance_count = 2

enable_private_network = true
enable_object_storage  = true
object_storage_region  = "fsn1"

enable_mimir_bucket = true
enable_loki_bucket  = true

allowed_external_cidrs = ["YOUR.IP.HERE/32"]
EOF

terraform init && terraform apply

Result: 2-server monitoring cluster (~€40/month) with private networking and object storage

Example 4: Single-Server Development (AWS)

Minimal setup for development or personal use:

cd cloudformation
INSTANCE_TYPE=t3.medium \
AWS_KEY_NAME=my-key \
LOKI_BUCKET=dev-logs-$(date +%s) \
STACK_NAME=dev-monitoring \
./deploy-stack-simple.sh

Result: Single t3.medium instance (~$30/month) with Loki log storage

Common Use Cases

Monitoring Kubernetes Clusters

Deploy the monitoring stack in your cloud provider
Install Prometheus exporters in your K8s cluster
Configure Prometheus to scrape your cluster endpoints
Import Kubernetes dashboards into Grafana
Set up alerts for pod failures, high memory, etc.

Application Performance Monitoring (APM)

Deploy the stack with Tempo enabled
Instrument your application with OpenTelemetry
Send traces to http://your-ip:4317 (gRPC) or :4318 (HTTP)
Visualize traces in Grafana's Explore view
Create dashboards showing request latency, error rates, etc.

Centralized Log Aggregation

Deploy with Loki S3/object storage enabled
Install Promtail on your servers or use fluentd/fluent-bit
Configure log shippers to send to http://your-ip:3100
Query logs in Grafana using LogQL
Create alerts based on log patterns

Infrastructure Monitoring

Deploy the stack using the CloudFormation or Terraform guides
Install node_exporter on all servers you want to monitor
Configure Prometheus scrape configs for your exporters
Import pre-built dashboards (Node Exporter Full, AWS CloudWatch, etc.)
Set up alerts for disk space, CPU, memory thresholds

Security Best Practices

Change default passwords: Update Grafana admin password immediately
Restrict network access: Use allowed_external_cidrs to limit access to trusted IPs
Use HTTPS: All services support TLS - configure certificates for production
Enable S3 encryption: Enabled by default in Terraform modules
Regular updates: Keep monitoring software up-to-date
IAM roles: Use cloud provider IAM roles instead of access keys where possible
Network isolation: Deploy in private subnets with bastion hosts for production

Troubleshooting

Can't access services after deployment

Check security groups/firewall: Ensure your IP is in allowed_external_cidrs
Wait for services to start: Give it 5-10 minutes after deployment
Verify instance is running: Check your cloud provider console
Use HTTPS for Grafana: Try https:// instead of http://

Services are slow or crashing

Increase instance size: Move to a larger instance type
Add more instances: Enable auto-scaling or increase instance_count
Enable S3/object storage: Move data to object storage to reduce local disk usage
Check logs: SSH to instances and check service logs

Bucket creation fails

Error: "Bucket name already exists"

Solution: S3 bucket names must be globally unique. Either:

Remove custom bucket name variables (auto-generates unique names)
Choose more unique names like company-mimir-$(date +%s)

Deployment stuck or times out

Check quota limits: You may have hit cloud provider limits
Verify credentials: Ensure API keys/credentials are valid
Check VPC/networking: Ensure VPC has internet gateway (for public deployments)
Review CloudFormation/Terraform events: Look for specific error messages

Support and Contributing

Getting Help

Documentation Issues: Check the deployment guide for your platform
CloudFormation Errors: Review the Events tab in AWS CloudFormation console
Terraform Errors: Run terraform plan to see what will change before applying
Service Issues: SSH to instances and check logs in /var/log/

Contributing

Improvements and bug fixes are welcome! Please:

Test changes in a non-production environment
Update relevant documentation
Submit detailed pull requests
Follow existing code style and patterns

About Digitalis.IO

Digitalis.IO specializes in cloud infrastructure, DevOps automation, and observability solutions. We help organizations build, deploy, and monitor modern cloud-native applications with best-in-class open-source tools.

Our Services:

☁️ Cloud Infrastructure Design & Migration
🔧 DevOps & Platform Engineering
📊 Observability & Monitoring Solutions
🚀 Kubernetes & Container Orchestration
🔒 Security & Compliance

Learn More:

🌐 Visit our website: digitalis.io
📧 Contact us for consulting and support
💼 Enterprise support packages available

License

This project is part of the Digitalis.IO monitoring stack. See the LICENSE file for details.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Any use of third-party trademarks or logos are subject to those third-party's policies.

Third-party products and services used in this project:

Grafana® - Grafana Labs
Prometheus® - Cloud Native Computing Foundation (CNCF)
Loki™ - Grafana Labs
Tempo™ - Grafana Labs
Mimir™ - Grafana Labs
Alertmanager - Prometheus project
Amazon Web Services (AWS)® - Amazon.com, Inc.
Hetzner Cloud™ - Hetzner Online GmbH
Exoscale™ - Exoscale
Terraform® - HashiCorp, Inc.
CloudFormation™ - Amazon Web Services, Inc.

What's Next

Coming Soon

Google Cloud Platform support
Azure support
Kubernetes Helm charts for container-native deployments
Pre-configured dashboards for common services
Backup and restore automation

After Deployment

Import dashboards: Browse https://grafana.com/grafana/dashboards/
Configure data sources: Set up Prometheus, Loki, and Tempo in Grafana
Set up alerts: Create alert rules in Prometheus/Alertmanager
Configure notifications: Add Slack, PagerDuty, email channels
Secure your installation: Change passwords, restrict access, enable TLS

Ready to get started? Choose your deployment method above and follow the guide! 🚀

Made with ❤️ by Digitalis.IO

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cloudformation		cloudformation
terraform		terraform
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

digitalis-io/ami-monitoring

Folders and files

Latest commit

History

Repository files navigation

Digitalis.IO Monitoring Stack

What's Inside

Architecture Overview

Component Roles

Quick Start

AWS Deployments

🚀 CloudFormation (Recommended for AWS)

⚙️ Terraform for AWS

Hetzner Cloud

Exoscale

Detailed Documentation

Deployment Guides

Post-Deployment

Option 1: Configuration Wizard (Recommended)

Option 2: Manual Configuration

Service Endpoints

Examples

Example 1: Quick Test Environment (AWS CloudFormation)

Example 2: Production Setup (AWS Terraform)

Example 3: Cost-Optimized Hetzner Deployment

Example 4: Single-Server Development (AWS)

Common Use Cases

Monitoring Kubernetes Clusters

Application Performance Monitoring (APM)

Centralized Log Aggregation

Infrastructure Monitoring

Security Best Practices

Troubleshooting

Can't access services after deployment

Services are slow or crashing

Bucket creation fails

Deployment stuck or times out

Support and Contributing

Getting Help

Contributing

About Digitalis.IO

License

Trademarks

What's Next

Coming Soon

After Deployment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages