Skip to content

Conversation

@michaelawyu
Copy link

This PR adds the setup scripts for the multi-cluster AI with KAITO demo.

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Copilot AI review requested due to automatic review settings November 4, 2025 19:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive multi-cluster AI demo setup using KubeFleet and Kaito. The setup automates the creation of Azure infrastructure, AKS clusters, and deploys AI model serving capabilities across multiple clusters with intelligent routing.

Key changes:

  • Automated Azure resource provisioning including virtual networks, subnets, AKS clusters, and Azure Container Registry
  • KubeFleet multi-cluster setup with hub and member cluster configuration
  • Istio service mesh deployment for cross-cluster networking
  • Kaito AI model serving infrastructure with GPU provisioning
  • Semantic router for intelligent query routing between AI models

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
multi-cluster-ai-with-kaito/setup.sh Main orchestration script that coordinates the entire setup process
multi-cluster-ai-with-kaito/azresources.sh Azure resource creation functions for VNets, subnets, AKS clusters, and ACR
multi-cluster-ai-with-kaito/kubefleet_setup.sh KubeFleet hub and member agent installation and configuration
multi-cluster-ai-with-kaito/istio.sh Multi-cluster Istio service mesh setup across member clusters
multi-cluster-ai-with-kaito/kaito.sh KAITO installation with GPU provisioner setup for AI workloads
multi-cluster-ai-with-kaito/kubefleet_placement.sh Resource placement logic for workspaces, inference pools, and gateways
multi-cluster-ai-with-kaito/semantic_router.sh Semantic router deployment for intelligent query routing
.gitignore Ignore patterns for temporary files and cloned repositories
README.md Updated project description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@ryanzhang-oss ryanzhang-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a step by step README inside the multi-cluster cookbook, I know the setup.sh is probably all one needs but it's not very friendly to beginners.

kubectl config use-context $FLEET_HUB_CTX

echo "Installing related resources on the KubeFleet hub cluster..."
helm upgrade --install $1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to install this in the default namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw,, why not install it on cluster3 directly?

Copy link
Author

@michaelawyu michaelawyu Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! We are using the default namespace just for simplicity reasons; currently we are having a separate hub cluster as in the upstream deployments the hub cluster can actually run workloads without additional configuration, and putting the charts there can be a bit tricky (deployments will run, etc.).

Comment on lines +36 to +39
connect_to_multi_cluster_service_mesh $MEMBER_1 $MEMBER_1_CTX $MEMBER_2_CTX $MEMBER_2 $MEMBER_2_ADDR $MEMBER_3_CTX $MEMBER_3 $MEMBER_3_ADDR
connect_to_multi_cluster_service_mesh $MEMBER_2 $MEMBER_2_CTX $MEMBER_1_CTX $MEMBER_1 $MEMBER_1_ADDR $MEMBER_3_CTX $MEMBER_3 $MEMBER_3_ADDR
connect_to_multi_cluster_service_mesh $MEMBER_3 $MEMBER_3_CTX $MEMBER_1_CTX $MEMBER_1 $MEMBER_1_ADDR $MEMBER_2_CTX $MEMBER_2 $MEMBER_2_ADDR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is the master among those 3? We are creating remote secrete on all of them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! For this part we are using a multi-primary Istio svc mesh setup so all clusters must be aware of each other. I didn't pick the primary-remote pattern as that one requires some additional setup.

@@ -1,2 +1,2 @@
# KubeFleet Cookbook
Examples and guides on using KubeFleet to manage multicluster scenarios.
A collection of various demos, tutorials, and labs for using the KubeFleet project.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; let me add a separate README for LiteLLM setup.

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants