Skip to content

feat: add CoreDNS hosts plugin support for LocalDNS #8165

Open
saewoni wants to merge 21 commits intomainfrom
sakwa/localdns_poc_clean
Open

feat: add CoreDNS hosts plugin support for LocalDNS #8165
saewoni wants to merge 21 commits intomainfrom
sakwa/localdns_poc_clean

Conversation

@saewoni
Copy link
Contributor

@saewoni saewoni commented Mar 25, 2026

Prior review history: This PR continues the work from #7639, which contains review comments and testing notes. That PR could not be reopened after a branch force-push. Please refer to #7639 for prior context.

What this PR does / why we need it:

Adds CoreDNS hosts plugin support for LocalDNS. When enabled, critical AKS FQDNs (mcr.microsoft.com, packages.aks.azure.com, etc.) are resolved and cached in /etc/localdns/hosts by a systemd timer (aks-hosts-setup), then served authoritatively by the CoreDNS hosts plugin — eliminating upstream DNS lookups for these endpoints.

Key changes:

  • aks-hosts-setup.sh / .service / .timer — resolves AKS FQDNs and writes /etc/localdns/hosts
  • CoreDNS corefile template gains hosts /etc/localdns/hosts { fallthrough } block in both VnetDNS and KubeDNS listeners
  • Two corefile variants generated at provisioning time: LOCALDNS_COREFILE_BASE (vanilla) and LOCALDNS_COREFILE_FULL (with all optional plugins). localdns.sh dynamically selects the full variant once the hosts file is populated.
  • enableLocalDNS() is the single entry point for all localdns setup including hosts plugin
  • E2E: table-driven tests across 7 distros (legacy) and 5 distros (scriptless), validating dig AA flag + IP match against hosts file
  • VHD: packer configs updated to include aks-hosts-setup files on all Linux images

Which issue(s) this PR fixes:

Add aks-hosts-setup.sh, aks-hosts-setup.service, and aks-hosts-setup.timer
to resolve critical AKS FQDNs via LocalDNS hosts plugin. This enables
authoritative DNS responses for MCR and other endpoints, reducing
dependency on external DNS servers during node bootstrap.

Changes include:
- New systemd units for hosts file setup and periodic refresh
- CSE integration: enableAKSHostsSetup() with VHD-presence guards
- CoreDNS corefile generation with hosts plugin support
- aks-node-controller scriptless path support
- E2E tests for Ubuntu 2204/2404 and AzureLinux V3
- ShellSpec unit tests for all new shell scripts
- Proto/pb.go updates for EnableHostsPlugin field
Add file provisioners for aks-hosts-setup.sh, aks-hosts-setup.service,
and aks-hosts-setup.timer to all 10 packer JSON templates, and add
cpAndMode entries to packer_source.sh to place them at:
- /opt/azure/containers/aks-hosts-setup.sh (0755)
- /etc/systemd/system/aks-hosts-setup.service (0644)
- /etc/systemd/system/aks-hosts-setup.timer (0644)

Without this, enableAKSHostsSetup() in CSE silently skips because
the VHD-presence guard finds the files missing.
- Install dnsutils in shellspec.Dockerfile so nslookup is available
  in the CI container, enabling real DNS resolution tests.
- Fix localdns_spec.sh: add missing End statement between two It
  blocks, remove duplicate rm of already-deleted file, and drop
  assertion for non-existent error message.
Remove teleportd/teleport references from cse_cmd.sh, parser.go,
cse_config.sh, cse_helpers.sh, cse_main.sh, baker.go, and types.go.
These were not part of the localdns hosts plugin work and were
accidentally carried over from a prior merge with main.
Restore files that had unrelated changes leaked in from the old merge:
- parser.go: restore SKIP_WAAGENT_HOLD entry that was accidentally deleted
- vmss.go: restore CustomDataWithHack boothook template, CustomDataFlatcar
  path, and injectWriteFilesEntriesToCustomData (only add MockUnknownCloud)
- types.go: restore CustomDataWriteFile type (only add MockUnknownCloud tag
  and localdns helper methods)
- validators.go: restore ValidateNodeExporter and ValidateWaagentLog to
  main's versions (only add localdns hosts plugin validators)
- cse_helpers.sh: restore to main's version (no localdns changes needed)
- .env.sample: restore to main's version
- cse_main.sh: restore SKIP_WAAGENT_HOLD conditional that was
  accidentally removed (stale change from old merge)
- aks_model.go: pass e2e-test=true tag when creating Private DNS zones
  so collectGarbagePrivateDNSZones can clean them up
Two additional SKIP_WAAGENT_HOLD guards in nodePrep (for the unhold
calls) were still missing after the previous fix only restored the
one in basePrep.
…oss all distros

- Replace nslookup "recursion not available" check with dig AA
  (Authoritative Answer) flag, which is stronger proof that the CoreDNS
  hosts plugin served the response rather than forwarding upstream
- Match IPs returned by dig against /etc/localdns/hosts entries
- Remove fake FQDN injection test (won't work since hosts file is
  populated by aks-hosts-setup.service from real DNS resolution)
- Simplify Corefile validation (remove fragile awk section-parsing)
- Consolidate per-distro tests into table-driven Test_LocalDNSHostsPlugin
  covering all 7 supported amd64 distros (Ubuntu 2204/2404, Azure Linux
  V2/V3, CBL Mariner V2, Flatcar, ACL)
- Add table-driven Test_LocalDNSHostsPlugin_Scriptless covering all 5
  distros with scriptless support (Ubuntu 2204/2404, Azure Linux V3,
  Flatcar, ACL)
- Remove duplicate validator calls from dedicated tests since
  ValidateCommonLinux already runs them when EnableHostsPlugin is set
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 26, 2026, 1:37 AM

@saewoni
Copy link
Contributor Author

saewoni commented Mar 25, 2026

Complete PR Comment Review (from PR #7639 + #8165)

Tracked all human reviewer comments across both PRs. Here's the full status:


🟦 Devinwong (PR #8165) — 2 comments

# Comment Status
1 cse_cmd.sh: Can't remove LOCALDNS_GENERATED_COREFILE — breaks forward compat ✅ Already addressed — we kept it, added new vars alongside
2 parser.go: Can't remove existing variable from public contract ✅ Already addressed — we kept it in parser.go

🟦 yewmsft (PR #7639) — 12 unique comments

# Comment Status
1 aks-hosts-setup.sh: If TARGET_CLOUD is empty, exit — don't fall back to public cloud ✅ Already addressed — checked in current code
2 aks-hosts-setup.sh: How long worst case? ✅ Already answered (~45s worst, ~5-10s typical)
3 aks-hosts-setup.sh: nslookupdigdig +short is more portable and trivial to parse ⚠️ Needs action — dig is available on all VHDs. Should switch.
4 aks-hosts-setup.sh: IPv6 validation is loose ⚠️ Low priority nit — works in practice, defense-in-depth improvement
5 cse_main.sh: Always enable hosts systemd unit, mount hosts file in corefile only if enableHostsPlugin=true ✅ Already addressed — enableAKSHostsSetup always runs, corefile selection is conditional
6 cse_config.sh: Don't fail on empty host file, just log error ✅ Already addressed — uses return 1 fallback
7 cse_main.sh: Confirm intended startup ordering — add comment about async behavior ⚠️ Needs action — should add clarifying comment
8 cse_main.sh: TODO/design notes — track or remove ✅ Already addressed — removed in current code
9 localdns.sh: Annotate node based on corefile, not hosts file ✅ Already addressed
10 localdns.sh: Log background PID for annotation process ✅ Already addressed — logging added
11 localdns.sh: Document timeout=0 restart behavior ⚠️ Needs action — stale comment at line 792 references old timeout parameter that no longer exists. Should fix.
12 cse_cmd.sh: Need to remove this line (old LOCALDNS_GENERATED_COREFILE_NO_HOSTS) ✅ Already addressed — that old variable was removed

🟦 awesomenix (PR #7639) — 4 unique comments

# Comment Status
1 baker.go: ShouldEnableLocalDNS should include ShouldEnableHostsPlugin() ⚠️ Needs discussion — awesomenix suggests ShouldEnableLocalDNS() || ShouldEnableHostsPlugin() so correct files are always passed. My position: both corefiles are needed because the decision happens at runtime on the VM.
2 baker.go: GetGeneratedLocalDNSCoreFile can generate the right file needed (single corefile) ⚠️ Needs discussion — same disagreement. awesomenix wants one corefile, I need both for runtime selection.
3 cse_helpers.sh: Move choosing logic inside service ✅ Already addressed — selection logic is now in select_localdns_corefile() inside localdns.sh
4 pipeline yaml: Unintended changes ✅ Already reverted

🟦 cameronmeissner (PR #7639) — 11 comments

# Comment Status
1 validators.go: Use lo.Map instead of manual loop ⚠️ Needs action — style nit
2 validators.go: Reuse existing systemd unit state validators ⚠️ Needs action — should use existing validators instead of ad-hoc scripts
3 validators.go: Same — reuse systemd validators for timer check ⚠️ Needs action — same as above
4 validation.go: Put helper methods that resolve scenario state in types.go ✅ Already addressed — IsHostsPluginEnabled() is in types.go
5 validation.go: Make IsHostsPluginEnabled() a *Scenario helper ✅ Already addressed
6 validators.go: Make FQDN helper a *Scenario helper, check NodeConfig too ⚠️ Needs check
7 vmss.go: Mock via NBC/NodeConfig settings instead of string replacement ⚠️ Needs check — depends on what string replacement was doing
8 test_execution.log: Don't check test logs into version control ✅ Already addressed — removed
9 cse_config.sh: Use base64 -d <<< "$corefile" || exit $ERR_LOCALDNS_FAIL ✅ Already addressed — current code does this
10 cse_config.sh: Is localdns.service written via customdata? ✅ Already addressed — yes it is
11 test_aks_hosts_validation.sh: Where is this invoked? ⚠️ Needs check — cameronmeissner pointed out it's excluded from linting, not actually invoked

🟦 lilypan26 (PR #7639) — 1 comment

# Comment Status
1 cse_config_spec.sh: How is this testing scriptless? Need to update AKSNodeConfig and parser ✅ Already addressed — parser.go updated with new variables

Summary of Action Items

Must-do (reviewer feedback that needs code changes):

  1. ⚠️ nslookupdig (yewmsft) — Switch resolve_ipv4()/resolve_ipv6() in aks-hosts-setup.sh to use dig +short
  2. ⚠️ Fix stale comment (yewmsft) — Line 792 in localdns.sh references timeout=0 parameter that no longer exists
  3. ⚠️ Add startup ordering comment (yewmsft) — In cse_main.sh near enableAKSHostsSetup/enableLocalDNS
  4. ⚠️ Reuse systemd validators (cameronmeissner) — In e2e/validators.go, use existing systemd unit validators instead of ad-hoc SSH commands
  5. ⚠️ lo.Map (cameronmeissner) — Style nit in validators.go

Needs discussion (disagreement with reviewer):

  1. ⚠️ ShouldEnableLocalDNS change (awesomenix) — Whether to merge hosts plugin check into ShouldEnableLocalDNS(). My position: need both corefiles for runtime selection. awesomenix's position: simplify CSE inputs. This is an architectural discussion — need to align.

Needs verification:

  1. ⚠️ test_aks_hosts_validation.sh (cameronmeissner) — Is it actually invoked anywhere, or just excluded from linting?
  2. ⚠️ GetDefaultFQDNsForValidation — Should it also check NodeConfig? (cameronmeissner)

Copilot suggestions worth considering:

  1. Guard resolve_ipv4() pipelinegrep exit 1 under pipefail could abort script. Add || true / || return 0
  2. Validation orderingValidateLocalDNSHostsFile should run before ValidateAKSHostsSetupService to avoid race
  3. IsHostsPluginEnabled() should check EnableLocalDns — For scriptless path, should be EnableLocalDns && EnableHostsPlugin

…mprove e2e validators

- aks-hosts-setup.sh: guard resolve_ipv4() pipeline with || return 0 under pipefail
- aks-hosts-setup.sh: tighten IPv6 regex to reject all-colon strings like ":::::::"
- cse_config.sh: restore LOCALDNS_BASE64_ENCODED_COREFILE in environment file for old VHD compat
- localdns.sh: track annotation background PID and kill in cleanup_localdns_configs
- e2e/types.go: IsHostsPluginEnabled() now checks EnableLocalDns && EnableHostsPlugin for scriptless path
- e2e/validation.go: reorder validators so ValidateLocalDNSHostsFile runs before ValidateAKSHostsSetupService
- e2e/validators.go: fix maxAttempts (60->33) to match ~5 minute polling comment
- spec: add ":::::::" to IPv6 mock test, add LOCALDNS_BASE64_ENCODED_COREFILE env file check
Copilot AI review requested due to automatic review settings March 26, 2026 00:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 2 comments.

…, reuse systemd validators

- aks-hosts-setup.sh: switch from nslookup to dig +short for DNS resolution (yewmsft)
- localdns.sh: fix stale "timeout=0" comment referencing removed parameter (yewmsft)
- cse_main.sh: add startup ordering comment for localdns/aks-hosts-setup (yewmsft)
- validators.go: reuse ValidateSystemdUnitIsNotFailed instead of ad-hoc script (cameronmeissner)
- spec: update mock tests from nslookup format to dig +short format
…r refactored function

- baker.go: GetGeneratedLocalDNSCoreFile now uses includeHostsPlugin=false since old
  VHDs don't provision /etc/localdns/hosts
- cse_main_spec.sh: rewrite tests to set env vars (LOCALDNS_COREFILE_ACTIVE,
  LOCALDNS_COREFILE_EXPERIMENTAL, SHOULD_ENABLE_HOSTS_PLUGIN) instead of positional args,
  matching the refactored select_localdns_corefile() signature
Copilot AI review requested due to automatic review settings March 26, 2026 00:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 44 changed files in this pull request and generated 3 comments.

Comment on lines +730 to +732
// The actual file writing happens in shell scripts (cse_config.sh) which decode and write
// the selected variant to /opt/azure/containers/localdns/localdns.corefile.
// Runtime selection between variants happens in cse_main.sh based on the availability of /etc/localdns/hosts.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says runtime selection between corefile variants happens in cse_main.sh, but the selection logic in this PR is implemented in localdns.sh (select_localdns_corefile() called on localdns service startup/restart) after cse_config.sh writes the environment file. Update this comment to match the actual control flow so future readers know where the dynamic selection happens.

Suggested change
// The actual file writing happens in shell scripts (cse_config.sh) which decode and write
// the selected variant to /opt/azure/containers/localdns/localdns.corefile.
// Runtime selection between variants happens in cse_main.sh based on the availability of /etc/localdns/hosts.
// The actual file writing happens in shell scripts (cse_config.sh), which decode and write
// a selected variant to /opt/azure/containers/localdns/localdns.corefile after populating the env file.
// Runtime selection between LOCALDNS_COREFILE_BASE and LOCALDNS_COREFILE_EXPERIMENTAL happens later in localdns.sh
// (via select_localdns_corefile(), invoked on localdns service start/restart) based on the availability of /etc/localdns/hosts.

Copilot uses AI. Check for mistakes.
Comment on lines +750 to +763
select_localdns_corefile() {
local hosts_file_path="/etc/localdns/hosts"

# Case 1: Both corefile variants available — dynamic selection
if [ -n "${LOCALDNS_COREFILE_EXPERIMENTAL:-}" ] && \
[ -n "${LOCALDNS_COREFILE_ACTIVE:-}" ]; then
echo "Both corefile variants available, selecting based on current state..." >&2
echo "LocalDNS corefile selection: SHOULD_ENABLE_HOSTS_PLUGIN=${SHOULD_ENABLE_HOSTS_PLUGIN:-<unset>}" >&2

if [ "${SHOULD_ENABLE_HOSTS_PLUGIN:-}" = "true" ]; then
echo "Hosts plugin is enabled, checking ${hosts_file_path} for content..." >&2
if [ -f "${hosts_file_path}" ] && \
grep -qE '^[0-9a-fA-F.:]+[[:space:]]+[a-zA-Z]' "${hosts_file_path}"; then
echo "Hosts file has IP mappings, using corefile with hosts plugin" >&2
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_localdns_corefile() hardcodes the hosts file path to /etc/localdns/hosts. This makes the function harder to test and inconsistent with annotate_node_with_hosts_plugin_status(), which already supports overriding the hosts file path via LOCALDNS_HOSTS_FILE. Consider using an overridable path here as well (e.g., hosts_file_path="${LOCALDNS_HOSTS_FILE:-/etc/localdns/hosts}") so unit tests can avoid writing to /etc and production can be more flexible if the path ever changes.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +72
setup() {
# Source localdns.sh to get select_localdns_corefile function
# We set __SOURCED__=1 to only source the functions, not run main execution
# shellcheck disable=SC1090
__SOURCED__=1 . "${LOCALDNS_PATH}"

# Create temp directory for test hosts file
TEST_DIR=$(mktemp -d)
HOSTS_FILE="${TEST_DIR}/hosts"
}

cleanup() {
rm -rf "${TEST_DIR}"
unset LOCALDNS_COREFILE_ACTIVE
unset LOCALDNS_COREFILE_EXPERIMENTAL
unset SHOULD_ENABLE_HOSTS_PLUGIN
}

BeforeEach 'setup'
AfterEach 'cleanup'

Context 'when both corefile variants are available and hosts plugin is enabled'
It 'returns EXPERIMENTAL when hosts file has valid IP mappings'
LOCALDNS_COREFILE_ACTIVE="${COREFILE_NO_HOSTS}"
LOCALDNS_COREFILE_EXPERIMENTAL="${COREFILE_WITH_HOSTS}"
SHOULD_ENABLE_HOSTS_PLUGIN="true"
# Create hosts file with valid IP mappings at the path the function checks
mkdir -p /etc/localdns
echo "10.0.0.1 mcr.microsoft.com" > /etc/localdns/hosts

When call select_localdns_corefile
The output should equal "${COREFILE_WITH_HOSTS}"
The status should be success
The stderr should include "Hosts file has IP mappings"
The stderr should include "using corefile with hosts plugin"
End

It 'returns ACTIVE when hosts file exists but has no IP mappings'
LOCALDNS_COREFILE_ACTIVE="${COREFILE_NO_HOSTS}"
LOCALDNS_COREFILE_EXPERIMENTAL="${COREFILE_WITH_HOSTS}"
SHOULD_ENABLE_HOSTS_PLUGIN="true"
mkdir -p /etc/localdns
echo "# comment only" > /etc/localdns/hosts

When call select_localdns_corefile
The output should equal "${COREFILE_NO_HOSTS}"
The status should be success
The stderr should include "not ready yet, falling back to corefile without hosts plugin"
End

It 'returns ACTIVE when hosts file does not exist'
LOCALDNS_COREFILE_ACTIVE="${COREFILE_NO_HOSTS}"
LOCALDNS_COREFILE_EXPERIMENTAL="${COREFILE_WITH_HOSTS}"
SHOULD_ENABLE_HOSTS_PLUGIN="true"
rm -f /etc/localdns/hosts

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests create and modify /etc/localdns/hosts directly (and setup() creates a temp HOSTS_FILE that is never used). Writing under /etc in unit tests can require elevated permissions and can leak state across specs if the file isn't cleaned up in AfterEach. Prefer using a temp hosts file and pointing the code under test at it (which likely requires select_localdns_corefile() to accept an overridable hosts path), or ensure /etc/localdns/hosts is removed/restored in cleanup().

Copilot uses AI. Check for mistakes.
…exit code

Use a single name end-to-end: CSE delivers LOCALDNS_COREFILE_BASE,
environment file stores LOCALDNS_COREFILE_BASE, localdns.sh reads
LOCALDNS_COREFILE_BASE. No more rename at the CSE↔VHD boundary.

Also fix select_localdns_corefile Case 3 (nothing available) to return 1
instead of 0, and guard the caller in regenerate_localdns_corefile with
|| true to prevent set -e from aborting before the friendly error message.
Verify that generateLocalDNSFiles falls back to LOCALDNS_GENERATED_COREFILE
when LOCALDNS_COREFILE_BASE is unset, simulating an old AgentBaker service
provisioning a VM with a new VHD.
Copilot AI review requested due to automatic review settings March 26, 2026 01:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 44 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

spec/parts/linux/cloud-init/artifacts/cse_main_spec.sh:1

  • These tests write to the real /etc/localdns/hosts on the test runner/container filesystem but the cleanup() does not remove it. This can leak state across specs and cause false positives/negatives (e.g., later tests observing an old hosts file). Add cleanup that removes /etc/localdns/hosts (and optionally /etc/localdns if empty) or refactor the function under test to accept an overridable hosts path so the spec can use $TEST_DIR instead.
    spec/parts/linux/cloud-init/artifacts/cse_main_spec.sh:1
  • These tests write to the real /etc/localdns/hosts on the test runner/container filesystem but the cleanup() does not remove it. This can leak state across specs and cause false positives/negatives (e.g., later tests observing an old hosts file). Add cleanup that removes /etc/localdns/hosts (and optionally /etc/localdns if empty) or refactor the function under test to accept an overridable hosts path so the spec can use $TEST_DIR instead.

Comment on lines +80 to +83
It 'creates hosts file with resolved addresses for all critical FQDNs'
When run command bash "${TEST_SCRIPT}"
The status should be success
The file "$HOSTS_FILE" should be exist
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test requires live DNS resolution to succeed in CI. However, aks-hosts-setup.sh explicitly exits 0 without writing a hosts file when no domains resolve (to avoid systemd marking the unit failed). In restricted or flaky network environments, the test can fail even when the script behavior is correct. To make the spec deterministic, prefer using the mock-dig path for 'hosts file is created' assertions (returning stable A/AAAA answers), and reserve live-DNS tests for a smaller, optionally-skipped integration layer.

Copilot uses AI. Check for mistakes.
…ervice

The Before= directive blocks kubelet and localdns startup until
aks-hosts-setup completes DNS resolution (up to 60s). This contradicts
the async design: localdns should start immediately with the base
corefile, and dynamic corefile selection handles the upgrade to the
hosts-plugin variant once the hosts file is populated.
printf '%s\n' is POSIX-portable and won't interpret escape sequences,
unlike echo which is shell-implementation-dependent.
Copilot AI review requested due to automatic review settings March 26, 2026 01:31
- Fix comment in parser/helper.go: selection happens in localdns.sh not cse_main.sh
- Use LOCALDNS_HOSTS_FILE override in select_localdns_corefile() for testability
- Rewrite cse_main_spec.sh to use temp dir instead of /etc/localdns/hosts
- Add actual permissions check to cloud-env file test in cse_config_spec.sh
- Replace brittle tail -n +10 with sed in aks_hosts_setup_spec.sh
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 44 changed files in this pull request and generated 3 comments.

Comment on lines +43 to +46
SHOULD_ENABLE_HOSTS_PLUGIN="true"
echo "10.0.0.1 mcr.microsoft.com" > "${LOCALDNS_HOSTS_FILE}"

When call select_localdns_corefile
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec writes to the real /etc/localdns/hosts but cleanup() never removes it. That can leak state across specs (and between examples in this file), causing false positives/negatives depending on execution order. Consider deleting /etc/localdns/hosts (and optionally rmdir /etc/localdns when empty) in cleanup(), or refactoring select_localdns_corefile() to allow a test override path so tests can stay within $TEST_DIR.

Copilot uses AI. Check for mistakes.
The contents of file "$AKS_CLOUD_ENV_FILE" should equal "TARGET_CLOUD=AzureUSGovernmentCloud"
End

It 'should set correct permissions on cloud-env file'
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test name says it verifies 0644 permissions on the cloud-env file, but it only asserts the file exists. Add an assertion on the file mode (e.g., via stat) so the test actually covers the permission behavior it describes.

Suggested change
It 'should set correct permissions on cloud-env file'
The file "$AKS_CLOUD_ENV_FILE" should be exist
cloud_env_perms=$(printf "0%s" "$(stat -c "%a" "$AKS_CLOUD_ENV_FILE")")
The value "$cloud_env_perms" should equal "0644"

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +24
# Helper to build a test script that uses the real system dig.
# Overrides only HOSTS_FILE and TARGET_CLOUD, preserving everything else
# (cloud selection, resolution loop, atomic write) from the real script.
# Uses sed to strip the shebang, set -euo pipefail, and HOSTS_FILE= lines
# so the test is not brittle to comment changes at the top of the script.
build_test_script() {
local test_dir="$1"
local hosts_file="$2"
local target_cloud="${3:-AzurePublicCloud}"
local test_script="${test_dir}/aks-hosts-setup-test.sh"

cat > "${test_script}" << EOF
#!/usr/bin/env bash
set -uo pipefail
HOSTS_FILE="${hosts_file}"
export TARGET_CLOUD="${target_cloud}"
EOF
sed -e '/^#!\/bin\/bash/d' -e '/^set -euo pipefail/d' -e '/^HOSTS_FILE=/d' "${SCRIPT_PATH}" >> "${test_script}"
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_test_script()/build_mock_test_script() depend on the first 9 lines of aks-hosts-setup.sh staying exactly the same (via tail -n +10). That makes the test brittle: any added comment/header line in the script will silently change test behavior. Prefer patching the script more robustly (e.g., sourcing it, or using sed to override HOSTS_FILE / TARGET_CLOUD assignments) rather than relying on fixed line numbers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants