Deploy a production-ready Hadoop HDFS cluster on GCP and AWS using Terraform and Ansible.
- Automated GCP infrastructure (VPC, firewall, VMs)
- One-command Hadoop/HDFS installation and configuration
- Secure SSH access (restricted to your IP)
- Easy scaling and cleanup
- Terraform (>= 1.0)
- Ansible (>= 2.9)
- Google Cloud CLI (
gcloud) jq(for JSON parsing)
gcloud auth login
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_IDEdit providers/gcp/terraform/terraform.tfvars:
gcp_project_id = "your-gcp-project-id"
gcp_region = "us-central1"
gcp_zone = "us-central1-a"
instance_type = "e2-medium"
image = "ubuntu-os-cloud/ubuntu-2204-lts"
worker_count = 2
allow_ssh_cidr = "YOUR_IP/32" # get your IP: curl -4 ifconfig.me
ssh_public_key_file = "~/.ssh/id_rsa.pub"chmod +x hdfs-deploy.sh
./hdfs-deploy.sh deploy- NameNode Web UI:
http://MASTER_IP:9870 - Secondary NameNode UI:
http://MASTER_IP:9868 - SSH to Master:
ssh ubuntu@MASTER_IP
ssh ubuntu@MASTER_IP
sudo su - hadoop
hdfs dfs -ls /
hdfs dfs -mkdir /test
hdfs dfs -put /etc/hosts /test/
hdfs dfsadmin -report./hdfs-deploy.sh destroyhdfs-cloud-starter/
├── hdfs-deploy.sh
├── generate-inventory.sh
├── providers/
│ └── gcp/
│ └── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── terraform.tfvars
├── ansible/
│ ├── hdfs-cluster.yml
│ ├── inventory.yml
│ └── templates/
│ ├── core-site.xml.j2
│ ├── hdfs-site.xml.j2
│ └── workers.j2
└── examples/
└── word-count/
- More workers: Change
worker_countinterraform.tfvars - Instance type: Change
instance_type - Hadoop version/config: Edit
ansible/hdfs-cluster.ymland templates
- SSH issues: Check your IP and update
allow_ssh_cidr - Instance status:
gcloud compute instances list - Service status: SSH to master, run
jpsand check logs
MIT License