-
Notifications
You must be signed in to change notification settings - Fork 53
refactor(ci): Add actions.summerwind.dev ARC runner deployment option #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
23deab0
3e77a4d
032362e
5fda0db
f2d1e36
f4c59fa
9366b81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -139,3 +139,58 @@ helm uninstall <runner-set-name> -n actions-runner-system | |
| # Remove the controller (after all runner sets are removed) | ||
| helm uninstall arc -n actions-runner-system | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Alternative: Using `actions.summerwind.dev` ARC | ||
|
|
||
| Instead of the official GitHub ARC controller above, you can use the community [actions-runner-controller](https://github.com/actions/actions-runner-controller) (`actions.summerwind.dev`). This uses `RunnerDeployment` CRDs instead of runner scale sets. | ||
|
|
||
| ### 1. Install the Controller | ||
|
|
||
| ```bash | ||
| helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller | ||
| helm repo update | ||
| helm install actions-runner-controller actions-runner-controller/actions-runner-controller \ | ||
| --namespace actions-runner-system \ | ||
| --create-namespace | ||
| ``` | ||
|
|
||
| ### 2. Create a GitHub App | ||
|
|
||
| Follow the same steps as [Section 1](#1-create-a-github-app) and [Section 2](#2-install-the-github-app) above to create and install a GitHub App. | ||
|
|
||
| ### 3. Create the Kubernetes Secret | ||
|
|
||
| Create a secret named `controller-manager` in the `actions-runner-system` namespace with your GitHub App credentials: | ||
|
|
||
| ```bash | ||
| kubectl create secret generic controller-manager \ | ||
| -n actions-runner-system \ | ||
| --from-literal=github_app_id=<your-app-id> \ | ||
| --from-literal=github_app_installation_id=<your-installation-id> \ | ||
| --from-file=github_app_private_key=<path-to-your-pem-file> | ||
| ``` | ||
|
|
||
| ### 4. Apply Runner Resources | ||
|
|
||
| ```bash | ||
| # RBAC for runner pods | ||
| kubectl apply -f scripts/k8s-runner-resources/arc-runner-rbac.yaml | ||
|
|
||
| # CPU runner deployment | ||
| kubectl apply -f scripts/k8s-runner-resources/arc-runner-cpu.yaml | ||
|
|
||
| # GPU runner deployment | ||
| kubectl apply -f scripts/k8s-runner-resources/arc-runner-gpu.yaml | ||
|
|
||
| # Autoscaler | ||
| kubectl apply -f scripts/k8s-runner-resources/arc-runner-autoscaler.yaml | ||
| ``` | ||
|
Comment on lines
+175
to
+189
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Document the missing cluster prerequisites before On Lines 179-188, the referenced manifests depend on preexisting 🤖 Prompt for AI Agents |
||
|
|
||
| ### 5. Verify | ||
|
|
||
| ```bash | ||
| kubectl get runnerdeployments -n actions-runner-system | ||
| kubectl get pods -n actions-runner-system | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-runner-4-h100-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-4-gpu-h100 | ||
|
|
||
| minReplicas: 4 | ||
| maxReplicas: 20 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
|
Comment on lines
+14
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: The TotalNumberOfQueuedAndInProgressWorkflowRuns metric counts all queued and in-progress workflow runs across the specified repositoryNames in the HorizontalRunnerAutoscaler (HRA) spec. It does not filter by the labels of the target RunnerDeployment or RunnerSet. The metric polls GitHub's API for pending workflow runs against the listed repositories and uses this total to suggest desired replicas (up to maxReplicas), without considering which specific runners (identified by labels) those jobs are targeted to. If multiple HRAs are created for the same repository(ies) but target different RunnerDeployments with different runner labels, they will all observe the same queue depth because each HRA's metric calculation is independent and based solely on the repository-scoped workflow run counts from GitHub, not on label-matched jobs or current runner availability for those labels. Jobs queued for specific labels will only be assignable to matching runners, but the scaling decision doesn't filter the count by labels—leading to potential over-scaling across HRAs if not managed carefully (e.g., via unique repositories per HRA). Citations:
All five GPU autoscalers will scale based on the same repo-wide queue metric, causing unrelated pools to scale unnecessarily. The Consider scoping each HRA to a dedicated repository, workflow label, or using a different metric that respects runner labels to prevent over-scaling. 🤖 Prompt for AI Agents
Comment on lines
+15
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Each autoscaler is configured with Useful? React with 👍 / 👎. |
||
| --- | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-runner-a10-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-gpu-a10 | ||
|
|
||
| minReplicas: 2 | ||
| maxReplicas: 4 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
| --- | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-runner-1-h100-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-1-gpu-h100 | ||
|
|
||
| minReplicas: 4 | ||
| maxReplicas: 20 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
| --- | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-runner-1-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-1-gpu | ||
|
|
||
| minReplicas: 2 | ||
| maxReplicas: 16 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
| --- | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-runner-2-h100-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-2-gpu-h100 | ||
|
|
||
| minReplicas: 2 | ||
| maxReplicas: 10 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
| --- | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: HorizontalRunnerAutoscaler | ||
| metadata: | ||
| name: arc-cpu-runner-autoscaler | ||
| namespace: actions-runner-system | ||
| spec: | ||
| scaleTargetRef: | ||
| kind: RunnerDeployment | ||
| name: arc-runner-cpu | ||
|
|
||
| minReplicas: 4 | ||
| maxReplicas: 8 | ||
|
|
||
| metrics: | ||
| - type: TotalNumberOfQueuedAndInProgressWorkflowRuns | ||
| repositoryNames: | ||
| - lightseekorg/smg | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| apiVersion: actions.summerwind.dev/v1alpha1 | ||
| kind: RunnerDeployment | ||
| metadata: | ||
| name: arc-runner-cpu | ||
| namespace: actions-runner-system | ||
| spec: | ||
| replicas: 4 | ||
| template: | ||
| spec: | ||
| ephemeral: true | ||
| repository: lightseekorg/smg | ||
| labels: | ||
| - k8s-runner-cpu | ||
| serviceAccountName: arc-runner-sa | ||
|
|
||
| containers: | ||
| - name: runner | ||
| image: fra.ocir.io/idqj093njucb/action-runner:v0.0.1 | ||
| resources: | ||
| requests: | ||
| cpu: "8" | ||
| memory: "16Gi" | ||
| limits: | ||
| cpu: "8" | ||
| memory: "16Gi" | ||
| env: | ||
| - name: HF_TOKEN | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: HUGGINGFACE_API_KEY | ||
| name: huggingface-secret | ||
| - name: OPENAI_API_KEY | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: OPENAI_API_KEY | ||
| name: openai-api-key | ||
| - name: ANTHROPIC_API_KEY | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: ANTHROPIC_API_KEY | ||
| name: anthropic-api-key | ||
| - name: XAI_API_KEY | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: XAI_API_KEY | ||
| name: xai-api-key | ||
| - name: docker | ||
| image: fra.ocir.io/idqj093njucb/docker:dind | ||
|
Comment on lines
+9
to
+48
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Docker-in-Docker (dind) configuration for the CPU runner is incomplete. The spec:
ephemeral: true
repository: lightseekorg/smg
labels:
- k8s-runner-cpu
serviceAccountName: arc-runner-sa
volumes:
- name: docker-sock
emptyDir: {}
- name: docker-storage
emptyDir: {}
containers:
- name: runner
image: fra.ocir.io/idqj093njucb/action-runner:v0.0.1
resources:
requests:
cpu: "8"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
volumeMounts:
- name: docker-sock
mountPath: /var/run
env:
- name: DOCKER_HOST
value: unix:///var/run/docker.sock
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: HUGGINGFACE_API_KEY
name: huggingface-secret
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
key: OPENAI_API_KEY
name: openai-api-key
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
key: ANTHROPIC_API_KEY
name: anthropic-api-key
- name: XAI_API_KEY
valueFrom:
secretKeyRef:
key: XAI_API_KEY
name: xai-api-key
- name: docker
image: fra.ocir.io/idqj093njucb/docker:dind
securityContext:
privileged: true
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
env:
- name: DOCKER_TLS_CERTDIR
value: ""
- name: DOCKER_DRIVER
value: overlay2
volumeMounts:
- name: docker-sock
mountPath: /var/run
- name: docker-storage
mountPath: /var/lib/dockerReferences
Comment on lines
+16
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing volumes and volume mounts for Docker socket sharing. The runner container references
Without shared volumes, the runner and DinD containers cannot communicate. Proposed fix to add volumes section serviceAccountName: arc-runner-sa
+
+ volumes:
+ - name: docker-sock
+ emptyDir: {}
+ - name: docker-storage
+ emptyDir:
+ medium: Memory
+ sizeLimit: 4Gi
containers:
- name: runner
image: fra.ocir.io/idqj093njucb/action-runner:v0.0.1
resources:
requests:
cpu: "8"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
+ volumeMounts:
+ - name: docker-sock
+ mountPath: /var/run
env:
+ - name: DOCKER_HOST
+ value: unix:///var/run/docker.sock
- name: HF_TOKEN🤖 Prompt for AI Agents
Comment on lines
+47
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Docker-in-Docker sidecar is missing critical configuration. The
Without these, the DinD sidecar will fail to function, and the runner container won't be able to use Docker. Proposed fix based on GPU runner configuration - name: docker
image: fra.ocir.io/idqj093njucb/docker:dind
+ securityContext:
+ privileged: true # Required for DinD
+ resources:
+ requests:
+ cpu: "1"
+ memory: "2Gi"
+ limits:
+ cpu: "2"
+ memory: "4Gi"
+ env:
+ - name: DOCKER_TLS_CERTDIR
+ value: "" # Disables TLS for shared socket use
+ - name: DOCKER_DRIVER
+ value: overlay2
+ volumeMounts:
+ - name: docker-sock
+ mountPath: /var/run
+ - name: docker-storage
+ mountPath: /var/lib/docker🤖 Prompt for AI Agents
Comment on lines
+47
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This Useful? React with 👍 / 👎. |
||
Uh oh!
There was an error while loading. Please reload this page.