-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Describe the bug
For some reason, I have been needing to comment out 127.0.0.1 sddc-manager-a.site-a.vcf.lab from /etc/hosts on sddc-manager-a.site-a.vcf.lab, or deployment steps repeatedly fail at "Install certificate on SDDC manager" and "Upload personality to SDDC manager". But I have also not exhaustively tested beyond the personality step, bear in mind.
For the certificate installation issue, reviewing the logs will suggest that it cannot validate the certificate against 127.0.0.1 and thus it won't install on the SDDC manager. I didn't check the logs for the personality upload, but assume it's a similar issue.
This KB is a close match: https://knowledge.broadcom.com/external/article/399945/sddc-manager-certificate-installation-ta.html
However while it offers the right solution, the cause may not be applicable. Not to mention the personality upload failure is undocumented so far - I discovered that because on this deployment, I actually uncommented the line immediately after passing the first blocker. In my successful deployments prior, I left it commented-out and forgot about following through with the KB's outstanding suggestion, there.
While I can reproduce these issues consistently, I don't have the impression many others experience this.
Then again, the barrier to entry for VCF (and Broadcom products as a whole) is taller than ever before, and maybe there just hasn't been enough attempts/exposure. Anyway, is there an optional parameter combination that's leading to this problem? Or an environmental issue?
Reproduction steps
- Deploy and start Holorouter, create a config, and start the deployment.
- Import the config
Title : HoloDeck 9.0 Config File
Name :
ConfigId : aaq3
InstanceId : holodeck
description : Holodeck
ConfigPath : /holodeck-runtime/config/aaq3.json
masterPassword : password
vSANMode : OSA
rootFolderPath : /holodeck-runtime
rootFolderName : holodeck-runtime
output : output
logs : /holodeck-runtime/logs/holodeck-runtime-log_08Sep2025_04_55.log
state : /holodeck-runtime/state/holodeck-runtime-state_08Sep2025_04_55_01.json
bin : bin
templates : templates
holorouterStatus : Configured
isPreCheckCompleted : true
Target : @{hostname=example.com; username=user; password=password; targetHostApiType=VirtualCenter; datacenter=Datacenter; cluster=cluster; datastore=ESXi-NVMe; networkPortGroup=DPortGroup
Holodeck-A; isDeprecatedCPU=False}
holodeck-sddc : @{Site-A=; Site-B=}
- Initiate the deployment:
New-HoloDeckInstance -Version 9.0 -InstanceID Holodeck -WorkloadDomainType SharedSSO -NsxEdgeClusterMgmtDomain -NsxEdgeClusterWkldDomain -DeployVcfAutomation -DeploySupervisor -LogLevel DEBUG -Site a -DepotType Offline - Answer pre-check prompts accordingly. Nothing special here that would influence the bug/outcome in a meaningful way.
- Wait for the deployment to fail while waiting for cluster bringup.
- Log into the SDDC Manager appliance via SSH as the vcf user, switch to root with
su -, then edit /etc/hosts to comment out the last line,127.0.0.1 sddc-manager-a.site-a.vcf.lab - Wait for retry or hit retry in the installer.
- Optionally uncomment the line again when you're past that step, then wait for NSX deployment steps, and eventually uploading of a personality to SDDC manager. It'll fail again.
- Comment out the line again and wait/retry. It should pass.
Expected behavior
Certificate installation/validation should not fail, nor should the personality upload step.
Additional context
I'm honestly unsure if this is an SDDC Manager, a VCF installer, or a Holodeck bug. Or why it's not a common problem (or so it seems).
I would suspect more that the deployment options in some combination may be playing a part in this, and/or it's partly environmental. Coincidentally during a production setup run, I did see that if the VCF installer is configured to do an in-place conversion into an SDDC Manager ("reuse VCF installer"), it expects 127.0.0.1: Incorrect SDDC Manager hostname '<actual FQDN>' provided whilst using the option to reuse VCF installer, expected hostname '127.0.0.1' - so, going on a limb here, it almost seems like the Holodeck deployment is configuring for in-place conversion from what I saw in /etc/hosts, but we are definitely not doing an in-place conversion. I could be totally off-base, though.
FWIW, there are some other bugs I ran into and resolved, in case they have some influence:
- AAAA records must be filtered in Holorouter's dnsmasq for underlying/upstream environments that support those lookups..
- Somewhat related to Issue 6, I found that if you do a
tdnf update, then the process will eventually result in the creation of/etc/systemd/network/99-dhcp-en.network. A reboot not needed to see that, but I did one anyway.
Per that issue, having multiple files in /etc/systemd/network results in what seems like a powershell parsing error in Holodeck. So, before beginning a deployment, you may need to delete that file (I am not using DHCP, mind). - FRR won't load if kubernetes experiences disk pressure during Holorouter's first time deploying something, I did run into that initially, despite sticking the Holorouter VM on an all-flash iSCSI target device. Works fine on local NVMe.