| title | Troubleshooting Guide | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| description | Common issues and solutions for development and deployment of the AI on Edge Flagship Accelerator, including environment setup, infrastructure deployment, and git workflow problems | |||||||||
| author | Edge AI Team | |||||||||
| ms.date | 2025-06-06 | |||||||||
| ms.topic | troubleshooting | |||||||||
| estimated_reading_time | 8 | |||||||||
| keywords |
|
This guide covers common issues encountered during development, testing, and deployment of the AI on Edge Flagship Accelerator. Use this as a reference for quick resolution of frequent problems.
Symptoms: Dev Container fails to build or start
Solutions:
-
Check Docker Desktop:
# Verify Docker is running docker info # Check available space (containers need significant disk space) docker system df
-
Clean Docker system:
# Remove unused containers and images docker system prune -a # Remove Dev Container specifically docker container rm $(docker container ls -aq --filter name=edge-ai)
-
Rebuild container:
# Use VS Code Command Palette # Remote-Containers: Rebuild Container
Symptoms: Container starts but required tools are not available
Solutions:
-
Verify tool availability:
# Check essential tools terraform version az version kubectl version --client npm --version -
Update container configuration:
# Check .devcontainer/devcontainer.json for tool versions # Rebuild with latest base image
Symptoms: Slow performance inside Dev Container
Solutions:
-
Allocate more resources to Docker:
- Increase memory allocation in Docker Desktop settings
- Allocate more CPU cores if available
-
Use bind mounts efficiently:
- Avoid unnecessary file watchers
- Use dockerignore for large directories
Symptoms: Azure CLI commands fail with authentication errors
Solutions:
-
Interactive login:
# Login and set subscription az login az account set --subscription "your-subscription-id" # Verify authentication az account show
-
Service Principal authentication (for CI/CD):
az login --service-principal \ -u $AZURE_CLIENT_ID \ -p $AZURE_CLIENT_SECRET \ --tenant $AZURE_TENANT_ID
Symptoms: kubectl commands fail to connect to cluster
Solutions:
-
Get cluster credentials:
# For AKS cluster az aks get-credentials --resource-group myResourceGroup --name myAKSCluster # Verify connection kubectl cluster-info
-
Check kubeconfig:
# View current configuration kubectl config view # List available contexts kubectl config get-contexts # Switch context kubectl config use-context mycontext
Symptoms: Terraform operations fail with state lock errors
Solutions:
-
Wait for lock to release (if another operation is running)
-
Force unlock (use carefully):
# Get lock ID from error message terraform force-unlock <LOCK_ID>
-
Use workspace isolation:
# Create isolated workspace for testing terraform workspace new test-$(date +%s) terraform workspace select test-$(date +%s)
Symptoms: Terraform init fails with provider version errors
Solutions:
-
Update provider constraints:
# In versions.tf terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.0" } } }
-
Upgrade providers:
# Upgrade to latest compatible versions terraform init -upgrade # Lock specific versions terraform providers lock
Symptoms: Deployment fails due to resource name collisions
Solutions:
-
Use unique naming:
# Add random suffix resource "random_string" "suffix" { length = 8 special = false upper = false } locals { unique_name = "${var.prefix}-${random_string.suffix.result}" }
-
Check existing resources:
# List resources in resource group az resource list --resource-group myResourceGroup
Symptoms: Bicep build fails with syntax errors
Solutions:
-
Check syntax with detailed output:
# Build with verbose output az bicep build --file main.bicep --verbose # Lint for issues az bicep lint --file main.bicep
-
Validate parameter types:
// Ensure parameter decorators are correct @description('Resource location') @allowed(['eastus', 'westus']) param location string
Symptoms: Template deploys but resources are not configured correctly
Solutions:
-
Use what-if deployment:
# Preview changes before deployment az deployment group what-if \ --resource-group myResourceGroup \ --template-file main.bicep \ --parameters @parameters.json -
Validate incrementally:
# Deploy smaller components first # Add resources incrementally
Symptoms: Deployment fails with insufficient permissions
Solutions:
-
Check role assignments:
# List role assignments for subscription az role assignment list --assignee $(az account show --query user.name -o tsv) # Check specific resource group az role assignment list --resource-group myResourceGroup
-
Required permissions for components:
- Key Vault: Key Vault Administrator or Contributor
- Networking: Network Contributor
- Kubernetes: Azure Kubernetes Service Contributor
- Storage: Storage Account Contributor
Symptoms: Deployment fails with provider not registered errors
Solutions:
# Register required providers
az provider register --namespace Microsoft.KeyVault
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.ContainerService
# Check registration status
az provider show --namespace Microsoft.KeyVault --query registrationStateSymptoms: Git operations fail with SSH authentication errors
Solutions:
-
Generate SSH key (if not exists):
ssh-keygen -t ed25519 -C "your.email@example.com" -
Add key to SSH agent:
eval "$(ssh-agent -s)" ssh-add ~/.ssh/id_ed25519
-
Add public key to GitHub:
- Copy
~/.ssh/id_ed25519.pubto GitHub SSH keys
- Copy
-
Test SSH connection:
ssh -T git@github.com
Symptoms: Git merge fails with conflicts
Solutions:
-
Resolve conflicts manually:
# Start merge git merge main # Edit conflicted files # Look for <<<<<<< HEAD markers # Stage resolved files git add resolved-file.tf # Complete merge git commit
-
Use merge tools:
# Configure merge tool git config --global merge.tool vimdiff # Launch merge tool git mergetool
Symptoms: Git shows detached HEAD warnings
Solutions:
# Create branch from current state
git checkout -b new-branch-name
# Or discard changes and return to main
git checkout mainSymptoms: Commits don't follow conventional format
Solutions:
-
Amend last commit:
# Fix most recent commit message git commit --amend -m "feat(terraform): add monitoring component"
-
Interactive rebase for multiple commits:
# Rewrite last 3 commits git rebase -i HEAD~3 -
Use conventional commit format:
feat(scope): description fix(scope): description docs(scope): description chore(scope): description
Symptoms: CI lint jobs report errors
Solutions:
-
Run specific linters locally:
# Run Terraform linting npm run tflint-fix-all # Run markdown linting npm run mdlint-fix
-
Review pipeline logs:
Check the individual lint job output in the Azure Pipelines run to identify which linter and file failed.
Symptoms: Markdown linting fails with formatting errors
Solutions:
-
MD025 (multiple H1 headings):
<!-- Remove duplicate H1 headings --> <!-- Use only one # heading per file -->
-
MD032 (lists need blank lines):
Text before list - List item 1 - List item 2 Text after list
-
MD040 (code blocks need language):
```bash echo "Specify language for code blocks" ```bash # Add empty line before and after
Symptoms: cspell reports errors for technical terms
Solutions:
-
Add to project dictionary:
# Add technical terms to .cspell-dictionary.txt echo "terraform" >> .cspell-dictionary.txt echo "kubernetes" >> .cspell-dictionary.txt
-
Inline ignores:
<!-- cspell:ignore terratest bicep --> This document discusses terratest and bicep.
Symptoms: Checkov reports security issues for acceptable configurations
Solutions:
-
Skip specific checks:
# Terraform example resource "azurerm_storage_account" "example" { # checkov:skip=CKV_AZURE_33: Public access required for this use case public_network_access_enabled = true }
-
Configure skip rules:
# .checkov.yml skip-check: - CKV_AZURE_33 - CKV2_AZURE_1
Symptoms: Checkov scans take too long
Solutions:
# Scan only changed folders
npm run checkov-changes
# Scan specific directories
checkov -d src/000-cloud/010-security-identity
# Use parallel processing
checkov --external-checks-dir ./custom-checks --parallelSymptoms: Go tests fail with timeout errors
Solutions:
# Increase timeout
go test -v -timeout 60m ./tests/...
# Run specific test
go test -v -run TestSpecificFunction -timeout 30mSymptoms: Tests fail with Azure authentication errors
Solutions:
// Use environment variables for authentication
// Set in CI/CD pipeline or locally
// AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_IDSymptoms: Test resources are not cleaned up
Solutions:
-
Manual cleanup:
# List test resource groups az group list --query "[?contains(name, 'test')].name" -o tsv # Delete test resources az group delete --name test-resource-group --yes --no-wait
-
Automated cleanup script:
#!/bin/bash # cleanup-test-resources.sh # Delete resource groups older than 24 hours with "test" in name az group list --query "[?contains(name, 'test')]" -o json | \ jq -r '.[] | select(.properties.provisioningState == "Succeeded") | .name' | \ xargs -I {} az group delete --name {} --yes --no-wait
-
Documentation:
- Check component README files
- Review existing GitHub issues
- Consult Azure documentation
-
Debugging:
- Enable verbose logging
- Use step-by-step troubleshooting
- Isolate the problem
-
Testing:
- Use minimal reproduction cases
- Test in isolated environments
- Verify assumptions
Use GitHub Copilot for troubleshooting:
# In VS Code chat, describe your issue:
"I'm getting a Terraform state lock error when deploying. How can I resolve this?"
"Checkov is reporting CKV_AZURE_33 for my storage account. Is this a false positive?"
"My Dev Container won't start and Docker Desktop shows an error. What should I check?"-
GitHub Issues:
- Search existing issues first
- Create new issue with detailed information
- Include error messages and environment details
-
Discussion Forums:
- Use GitHub Discussions for general questions
- Share solutions that worked for you
- Help others with similar issues
Include this information when reporting issues:
**Environment:**
- OS: [Windows 11/macOS 13/Ubuntu 22.04]
- Docker Desktop version:
- VS Code version:
- Dev Container: [Yes/No]
**Tools:**
- Terraform version:
- Azure CLI version:
- kubectl version:
**Problem Description:**
Clear description of the issue
**Steps to Reproduce:**
1. Step one
2. Step two
3. See error
**Expected Behavior:**
What should happen
**Actual Behavior:**
What actually happens
**Error Messages:**
[Include full error messages]
**Additional Context:**
Any other relevant informationFor critical issues:
- Immediate help: Use GitHub Copilot for quick guidance
- Team support: Reach out to project maintainers
- Security issues: Use private reporting for vulnerabilities
- Blocking issues: Create high-priority GitHub issues
Remember: Most issues have been encountered before. Search existing documentation and issues first, then ask for help with specific details about your situation.
For more information about development workflows, see the Development Environment and Contributing Guidelines.
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.