Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ gen
.idea/
.idea/workspace.xml

# Self-sched-deploy generated files
scripts/self-sched-deploy/vars/config.env
scripts/self-sched-deploy/vars/state.env
scripts/self-sched-deploy/repos/
1 change: 1 addition & 0 deletions bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ source .ansible/bin/activate
pip3 install -q --upgrade pip
pip3 install -q 'ansible<12.0.0' netaddr
pip3 install -q jmespath --force
pip3 install -q yq
ansible-galaxy collection install ansible.utils --force
393 changes: 393 additions & 0 deletions scripts/self-sched-deploy/Makefile

Large diffs are not rendered by default.

163 changes: 163 additions & 0 deletions scripts/self-sched-deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Self-Scheduling OCP Deployment

Interactive OCP deployment with QUADS self-scheduling lab assignment.

This tool provides an easy way to deploy an OpenShift cluster on Red Hat labs (Scale Lab, Performance Lab) by combining:
- **ansible-quads-ssm**: Self-service lab assignment via QUADS API
- **jetlag**: Assisted Installer-based OCP deployment

## Prerequisites

- **Jetlag environment**: Run `source bootstrap.sh` from the jetlag repo root to setup the virtual environment
- **GNU Make**: For running the Makefile
- **jq**: For JSON parsing (r630 server detection)
- **Pull Secret**: Download from [console.redhat.com](https://console.redhat.com/openshift/install/pull-secret)
- **SSH Keys**: Default `~/.ssh/id_rsa` and `~/.ssh/id_rsa.pub`
- **QUADS Account**: Access to the QUADS self-scheduling system

## Quick Start

```bash
# Bootstrap jetlag environment (from repo root)
source bootstrap.sh

# Navigate to the self-sched-deploy directory
cd scripts/self-sched-deploy

# Run full interactive deployment
make
```

The tool will:
1. Prompt for all required configuration
2. Clone/update the ansible-quads-ssm repository
3. Create QUADS assignment for lab hosts
4. Generate jetlag inventory
5. Setup the bastion node
6. Deploy the OCP cluster

## Usage

### Full Deployment

```bash
# Interactive deployment (prompts for all configuration)
make
```

### Individual Phases

Each phase prompts for required configuration:

```bash
make repos # Clone/update ansible-quads-ssm repository
make create-assignment # QUADS assignment
make inventory # Generate jetlag inventory
make bastion # Setup bastion node
make cluster # Deploy OCP cluster
```

### Re-running Phases

To re-run specific phases after failures, simply run the individual target:

```bash
# Re-run cluster deployment only
make cluster

# Re-run bastion setup and cluster deployment
make bastion && make cluster
```

### Utility Commands

```bash
make status # Show current deployment status
make clean # Remove generated config and state files
make clean-all # Also remove cloned repository
make help # Show all available targets
```

## Configuration Options

The interactive prompts collect the following:

### QUADS Configuration
- **API Server**: QUADS server URL (e.g., `quads2.rdu2.scalelab.redhat.com`)
- **Username**: Your username (without domain)
- **User Domain**: Email domain (e.g., `redhat.com`)
- **Password**: QUADS password

### Lab Configuration
- **Lab**: `scalelab` or `performancelab`

### OCP Configuration
- **Build Type**: `ga` (General Availability), `dev` (Development), or `ci` (Continuous Integration)
- **Version**: OCP version (e.g., `latest-4.17`, `candidate-4.17`, `4.19.0-0.nightly-2025-02-25-035256`)

### Cluster Configuration
- **Cluster Type**: `mno` (Multi-Node OpenShift) or `sno` (Single-Node OpenShift)
- **Worker Count**: Number of worker nodes (MNO only)
- **Network Stack**: `ipv4`, `ipv6`, or `dual` (dual-stack)

### Paths
- **Pull Secret**: Path to your `pull-secret.txt` file (default: jetlag root)

## File Structure

```
scripts/self-sched-deploy/
├── Makefile # Main orchestration
├── prompt-config.sh # Interactive configuration
├── check-r630.sh # r630 server detection script
├── vars/
│ ├── config.env # Generated: current run config
│ └── state.env # Generated: assignment state
├── templates/
│ └── quads_config.yml.j2 # QUADS config template
├── repos/ # Cloned repositories (gitignored)
│ └── ansible-quads-ssm/
└── README.md
```

The jetlag configuration is generated by copying `ansible/vars/all.sample.yml` and modifying it with `sed` based on user configuration.

## Automatic Assignment Detection

When you run any target, the tool checks for an existing assignment in `vars/state.env`. If found, it prompts whether to reuse the existing assignment or create a new one. This allows you to:

1. Resume a failed deployment without creating a new assignment
2. Re-run specific phases after fixing issues
3. Iterate on cluster configuration without waiting for new hosts

## Network Stacks

### IPv4 Single-Stack (default)
Standard IPv4-only deployment. No special requirements.

### IPv6 Single-Stack
IPv6-only deployment. Automatically enables bastion registry for disconnected installation.

### Dual-Stack
Both IPv4 and IPv6. Standard connected installation using the IPv4 network.

## Automatic Features

### r630 Server Detection
During inventory generation, the tool automatically detects if any allocated servers are Dell r630 models. If detected, it enables `reset_idrac: true` in the jetlag configuration to clear iDRAC job queues and reset the iDRAC service before deployment.

## Troubleshooting

### Assignment Failed
Check your QUADS credentials and ensure you have access to self-scheduling.

### Inventory Generation Failed
Verify the cloud name in `vars/state.env` matches an active QUADS assignment.

### Cluster Deployment Failed
Re-run with:
```bash
make cluster
```

The tool will prompt for configuration and use the existing assignment.
77 changes: 77 additions & 0 deletions scripts/self-sched-deploy/check-r630.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash
# Check if allocated servers include r630 models
# Downloads ocpinventory.json from QUADS and parses pm_addr fields
#
# Usage: ./check-r630.sh <lab> <cloud_name> [quads_server]
# Returns: "r630" and exit 0 if r630 found, "none" and exit 1 otherwise

set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
JETLAG_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"

# Activate venv to access yq
if [[ -f "${JETLAG_ROOT}/.ansible/bin/activate" ]]; then
source "${JETLAG_ROOT}/.ansible/bin/activate"
fi

LAB="${1:-scalelab}"
CLOUD_NAME="${2}"
QUADS_SERVER="${3}"

if [[ -z "$CLOUD_NAME" ]]; then
echo "Usage: $0 <lab> <cloud_name> [quads_server]" >&2
exit 2
fi

# Check for required dependencies
if ! command -v jq &> /dev/null; then
echo "Error: jq is required but not installed" >&2
exit 2
fi

# Use provided QUADS server or map from lab (from ansible/vars/lab.yml)
if [[ -n "$QUADS_SERVER" ]]; then
QUADS_HOST="$QUADS_SERVER"
else
LAB_YML="${JETLAG_ROOT}/ansible/vars/lab.yml"
QUADS_HOST=$(yq -r ".labs.${LAB}.quads" "$LAB_YML")
if [[ -z "$QUADS_HOST" || "$QUADS_HOST" == "null" ]]; then
echo "Error: Could not find QUADS server for lab '$LAB' in $LAB_YML" >&2
exit 2
fi
fi

# Download ocpinventory.json, retrying until nodes data is available
INVENTORY_URL="http://${QUADS_HOST}/instack/${CLOUD_NAME}_ocpinventory.json"
MAX_RETRIES=12
RETRY_INTERVAL=10

for ((i=1; i<=MAX_RETRIES; i++)); do
INVENTORY_JSON=$(curl -s "$INVENTORY_URL")
if [[ -n "$INVENTORY_JSON" && "$INVENTORY_JSON" != "null" ]] && \
echo "$INVENTORY_JSON" | jq -e '.nodes | length > 0' &>/dev/null; then
break
fi
if [[ $i -eq $MAX_RETRIES ]]; then
echo "Error: ocpinventory.json not available after $MAX_RETRIES attempts" >&2
exit 2
fi
echo "Waiting for ocpinventory.json to be ready... (attempt $i/$MAX_RETRIES)" >&2
sleep "$RETRY_INTERVAL"
done

# Extract server models from pm_addr fields
# Pattern: mgmt-[rack]-[unit]-[model].domain → extract [model]
MODELS=$(echo "$INVENTORY_JSON" | jq -r '.nodes[].pm_addr' | \
sed 's/\..*//' | \
sed 's/.*-//')

# Check for r630
if echo "$MODELS" | grep -q "^r630$"; then
echo "r630"
exit 0
fi

echo "none"
exit 1
Loading