This repository delivers the Terraform, Ansible, and driver customizations required to build an OpenNebula Hosted Cloud on Scaleway Elastic Metal. It extends the upstream one-deploy and one-deploy-validation projects via git submodules and adds Scaleway-specific infrastructure modules, inventories, and Flexible IP (FIP) drivers.
Use the in-repo deployment guide for a narrative, end-to-end walkthrough; the README below highlights the main entry points and recent platform changes.
- Scaleway Hosted Cloud Overview
- Requirements
- Repository Setup
- Secrets and Environment Variables
- Infrastructure Provisioning
- Inventory and Parameters
- Networking Configuration
- OpenNebula Deployment Workflow
- Validation Suite
- Troubleshooting & Known Issues
- CI/CD Roadmap
- Extending the Cloud
- Target architecture runs one OpenNebula frontend (also acting as a KVM node) and one or more KVM hypervisors on EM-A610R-NVMe servers connected through a private VPC and optional public Flexible IPs.
- Terraform modules under
scw/create networking (VPC, VLANs, Flexible IP routing), bare-metal instances, and dynamic inventories. - Ansible roles in
submodule-one-deployconfigure OpenNebula, whileroles/one-driverprovides a custom VNM bridge hook to allocate/detach Flexible IPs through Scaleway APIs (reworked in commits967216fanda165376to handle multi-NIC workloads). deployment_guide.mddocuments the architecture diagrams, hardware SKUs, and provisioning prerequisites in detail.
| Component | Version / Notes |
|---|---|
| OpenTofu | ≥ 1.5.0 (used by the scw/* modules) |
| Python / pip | Needed for hatch and Ansible tooling |
| Hatch | Used to manage the scaleway-default execution environment |
| Ansible | Driven by the Makefile targets |
| Scaleway Credentials | API access key, secret key, organization/project IDs, Flexible IP token |
Install the local tooling:
pip install hatch
make submodule-requirements # installs collection dependencies via submodule-one-deploygit clone https://github.com/OpenNebula/hosted-cloud-scaleway.git
cd hosted-cloud-scaleway
git submodule update --init --remote --mergeMakefile shortcuts (deployment, validation, specifics, submodule-requirements) proxy into the submodules with the Scaleway inventory pre-selected.
Setup follows the template in .secret.skel:
cp .secret.skel .secret
vim .secret # fill TF_VAR_*, SCW_* values, Flexible IP token, OpenNebula password, etc.
source .secretKey variables:
TF_VAR_customer_name,TF_VAR_project_name,TF_VAR_project_fullname— naming for resources/state.SCW_ACCESS_KEY/SCW_SECRET_KEYplusSCW_DEFAULT_ORGANIZATION_ID,SCW_DEFAULT_REGION, andSCW_DEFAULT_ZONE.- Flexible IP defaults (
TF_VAR_private_subnet,TF_VAR_worker_count) consumed by the Terraform modules and driver defaults (scw_flexible_ip_*).
.secretstays ignored; never commit credential material.
Modules under scw/ are executed sequentially (OpenTofu CLI):
| Order | Module | Purpose |
|---|---|---|
| 001 | terraform_state_management |
Bootstrap state bucket/project metadata |
| 002 | vpc |
Create VPC, subnets, VLAN assignments |
| 003 | opennebula_instances |
Provision frontend & hypervisors + cloud-init assets |
| 004 | opennebula_instances_net |
Configure networking (netplan, bridges, VLAN tags) |
| 005 | opennebula_inventories |
Render inventory/scaleway.yml from module outputs |
Example run:
cd scw/002.vpc
tofu init
tofu plan
tofu apply
cd ../..Consult deployment_guide.md#4-infrastructure-deployment-tofu-modules for module-specific inputs, expected outputs, and screenshots.
inventory/scaleway.yml is auto-generated by module 005 but can be overridden for PoCs. Adapt these key parameters:
| Description | Variable(s) | Files / Templates |
|---|---|---|
| SSH user, key path | ansible_user, ansible_ssh_private_key_file |
inventory/group_vars/all.yml |
| Frontend + node metadata | frontend.hosts.*, node.hosts.* |
inventory/scaleway.yml |
| Scaleway project / Flexible IP identifiers | scw_project_id, scw_server_id, scw_flexible_ip_token, scw_flexible_ip_zone |
inventory/scaleway.yml, roles/one-driver/defaults/main.yaml |
| OpenNebula credentials | one_pass, one_version |
inventory/scaleway.yml, .secret |
| VNM templates | vn.pubridge.template.*, vn.vxlan.template.* |
inventory/scaleway.yml |
| Validation knobs | validation.* |
inventory/group_vars/all.yml |
inventory/group_vars/all.yml also defines which cloud validation tests will be executed (core services, storage/network benchmarks, connectivity matrix, marketplace VM instantiation, etc.).
- Cloud-init assets under
scw/004.opennebula_instances_net/template/apply a deterministic netplan layout:br0for public/FIP traffic,brvmtovmfor host-to-host VXLAN (vmtovmaltname), and VLAN subinterfaces for private routing. cloud_init_custom.tmplhard-codesenp5s0as the bare-metal NIC and wires in Tofu outputs such asbase_public_ip,gateway,private_network_vlan_assignment,vmtovm_vlan_assignment, and IPAM settings.- After provisioning, run an Ansible ping to verify reachability:
ansible -i inventory/scaleway.yml all -m ping -bRefer to deployment_guide.md#5-inventory-validation-ansible for expected output and troubleshooting tips (missing SSH key, mismatch between generated PEM and inventory, etc.).
-
Review custom roles and hooks (
roles/one-driver,playbooks/scaleway.yml). -
Deploy the base OpenNebula stack (frontend, KVM nodes, shared configs):
make deployment
-
Apply Scaleway-specific driver hooks and Flexible IP sync (
specificstarget invokes theone-driverrole on frontend + nodes using the Hatch environment):make specifics
-
Run the validation suite:
make validation
Each step is re-runnable; Ansible plays are idempotent and the Flexible IP hooks now cope with multi-NIC VMs (commit a165376).
The defaults in inventory/group_vars/all.yml enable:
- Core service health checks (
oned,gate,flow,fireedge). - Storage benchmark VM instantiation on
pub3. - Network benchmark between all hypervisors (iperf, ping).
- Connectivity matrix across hosts and
brvmtovm. - Marketplace VM deploy & smoke tests (Alpine Linux 3.21 template, optional VNET creation).
Disable tests by setting the corresponding validation.run_* flag to false. Validation output is saved under /tmp/cloud_verification_report.html (and other paths documented in the file).
-
Flexible IP attach/detach:
roles/one-driver/templates/vnm/bridge/{pre,clean}.dhooks log verbosely to/var/log/vnm/scw-flexip-pre.log(attach) and/var/log/vnm/scw-flexip-clean.log(detach). Inspect those files when a driver action stalls—the logs capture every API call/response. Recent fixes (4399aed,a165376) ensure bridges are cleaned when VMs mix public & private NICs. Re-runmake specificsafter updating scripts so hosts download the latest hooks. -
Ubuntu gateway for Flexible IPs: When a Flexible IP lives outside the VM gateway netmask, Ubuntu does not auto-create the route after attaching the public NIC, so outbound traffic stalls. To persist the fix, drop a small netplan file and apply it (the alternative
ip route addcommand disappears after reboot):# /etc/netplan/99-flexip-route.yaml network: version: 2 renderer: networkd ethernets: eth0: routes: - to: "62.210.0.1/32" via: 0.0.0.0
Apply it with
sudo netplan apply. TheETH0_ROUTEScontext setting remains broken by OpenNebula/one-apps#284 (VNET-independentROUTES) and OpenNebula/one#7348 (ETHx_ROUTES), so codifying the route via netplan is the only reliable workaround today. -
Host synchronization: The role runs
onehost sync --forcefor each registered host. Inspect Ansible output if Sync fails; hosts remain operational but may use outdated hooks. -
Networking drift: Re-apply module
004.opennebula_instances_netor netplan templates if manual edits break VLAN alt-names orbrvmtovmroutes. -
Credentials: Missing Flexible IP token (
scw_flexible_ip_token) or project ID causes the driver role to abort early via assertions.
deployment_guide.md#7-cicd-pipeline-wip outlines a GitHub Actions pipeline (WIP) that would:
- Validate inputs (tokens, CIDRs, host IPs).
- Run
tofu init/plan. - Require manual approval for
tofu apply. - Configure Ansible, then manually trigger
one-deploy-validation,one-deploy, and eventualtofu destroy.
A reference Mermaid diagram is provided in the guide for future automation work.
To onboard a new hypervisor:
- Rerun the provisioning modules (especially
003and004) with an increasedTF_VAR_worker_count. - Regenerate inventories (
005) and verify SSH access. - Apply
make deploymentfollowed bymake specificsso hooks and Flexible IP metadata land on the new host. - Re-run validation to ensure the additional capacity integrates cleanly.
For deeper background, diagrams, and step-by-step screenshots, consult deployment_guide.md.