-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add scripts for the multi-cluster AI with KAITO demo #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive multi-cluster AI demo setup using KubeFleet and Kaito. The setup automates the creation of Azure infrastructure, AKS clusters, and deploys AI model serving capabilities across multiple clusters with intelligent routing.
Key changes:
- Automated Azure resource provisioning including virtual networks, subnets, AKS clusters, and Azure Container Registry
- KubeFleet multi-cluster setup with hub and member cluster configuration
- Istio service mesh deployment for cross-cluster networking
- Kaito AI model serving infrastructure with GPU provisioning
- Semantic router for intelligent query routing between AI models
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| multi-cluster-ai-with-kaito/setup.sh | Main orchestration script that coordinates the entire setup process |
| multi-cluster-ai-with-kaito/azresources.sh | Azure resource creation functions for VNets, subnets, AKS clusters, and ACR |
| multi-cluster-ai-with-kaito/kubefleet_setup.sh | KubeFleet hub and member agent installation and configuration |
| multi-cluster-ai-with-kaito/istio.sh | Multi-cluster Istio service mesh setup across member clusters |
| multi-cluster-ai-with-kaito/kaito.sh | KAITO installation with GPU provisioner setup for AI workloads |
| multi-cluster-ai-with-kaito/kubefleet_placement.sh | Resource placement logic for workspaces, inference pools, and gateways |
| multi-cluster-ai-with-kaito/semantic_router.sh | Semantic router deployment for intelligent query routing |
| .gitignore | Ignore patterns for temporary files and cloned repositories |
| README.md | Updated project description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a step by step README inside the multi-cluster cookbook, I know the setup.sh is probably all one needs but it's not very friendly to beginners.
| kubectl config use-context $FLEET_HUB_CTX | ||
|
|
||
| echo "Installing related resources on the KubeFleet hub cluster..." | ||
| helm upgrade --install $1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to install this in the default namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw,, why not install it on cluster3 directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! We are using the default namespace just for simplicity reasons; currently we are having a separate hub cluster as in the upstream deployments the hub cluster can actually run workloads without additional configuration, and putting the charts there can be a bit tricky (deployments will run, etc.).
| connect_to_multi_cluster_service_mesh $MEMBER_1 $MEMBER_1_CTX $MEMBER_2_CTX $MEMBER_2 $MEMBER_2_ADDR $MEMBER_3_CTX $MEMBER_3 $MEMBER_3_ADDR | ||
| connect_to_multi_cluster_service_mesh $MEMBER_2 $MEMBER_2_CTX $MEMBER_1_CTX $MEMBER_1 $MEMBER_1_ADDR $MEMBER_3_CTX $MEMBER_3 $MEMBER_3_ADDR | ||
| connect_to_multi_cluster_service_mesh $MEMBER_3 $MEMBER_3_CTX $MEMBER_1_CTX $MEMBER_1 $MEMBER_1_ADDR $MEMBER_2_CTX $MEMBER_2 $MEMBER_2_ADDR | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who is the master among those 3? We are creating remote secrete on all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! For this part we are using a multi-primary Istio svc mesh setup so all clusters must be aware of each other. I didn't pick the primary-remote pattern as that one requires some additional setup.
| @@ -1,2 +1,2 @@ | |||
| # KubeFleet Cookbook | |||
| Examples and guides on using KubeFleet to manage multicluster scenarios. | |||
| A collection of various demos, tutorials, and labs for using the KubeFleet project. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LiteLLM setup tutorial link:
https://github.com/kaito-project/kaito-cookbook/blob/master/examples/litellm/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; let me add a separate README for LiteLLM setup.
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
This PR adds the setup scripts for the multi-cluster AI with KAITO demo.