Skip to content

Conversation

@Rob-Johnson
Copy link
Contributor

@Rob-Johnson Rob-Johnson commented Nov 21, 2025

building on @wjhoward's work in #4164 manage per-namespace k8s services in a cluster. largely based on what had previously been build in setup_istio_mesh.py, which is now unused.
tidy up what's left of the old setup_istio_mesh code, too.

Notable changes:

  • I'm using k8s server side apply to handle applying patches to the service. I think this is now the recommended way?
  • adding the name param to the ContainerPort we add for the service. This means I don't have to lookup the correct container_port for the service (and hope they're not different between backends in a service), but k8s can figure out which one to use by name. NB: will trigger a big bounce

Limitations:

  • this won't support cross-region advertising. we'll only see matching backends from the cluster the service belongs to. it also won't support the advertise/discover keys - everything in the cluster will be reachable.
  • I've chosen to put the service in the matching k8s namespace that the smartstack namespace belongs to (whichever service smartstack.yaml is in). that means:
    • any backends for a service/instance that register in a different k8s namespace to the default paastasvc-$service won't show up.
    • any backends for a service/instance that has a value in registrations for a namespace belonging to a different service won't show up.

@Rob-Johnson Rob-Johnson force-pushed the namespace-service-manager branch 6 times, most recently from af4acb5 to 529251a Compare November 24, 2025 14:10
@Rob-Johnson Rob-Johnson marked this pull request as ready for review November 24, 2025 14:19
@Rob-Johnson Rob-Johnson requested a review from a team as a code owner November 24, 2025 14:19
@Rob-Johnson Rob-Johnson requested a review from itsgc November 25, 2025 11:55
this manages per-namespace k8s services in a cluster. largely based on
what had previously been build in setup_istio_mesh.py, which is now
unused.
tidy up what's left of the old setup_istio_mesh code, too.
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just nits outside of get_service_kube_namespace() probably needing to actually read the service config to get the correct k8s namespace

Comment on lines +234 to +235
def get_service_kube_namespace(service: str) -> str:
return f"paastasvc-{sanitise_kubernetes_name(service)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we'll need to load the actual config so that we can use the namespace getter - we allow folks to set custom namespaces and there's a small number of services still in the paasta namespace :(

Comment on lines +238 to +250
def is_external_routing_enabled(config: Mapping) -> bool:
"""Check if external routing is enabled for a namespace config.

Args:
config: Namespace configuration dict

Returns:
True if routing.external is set to True
"""
routing_cfg = config.get("routing", {})
if not isinstance(routing_cfg, Mapping):
return False
return bool(routing_cfg.get("external"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, this might be better placed in the instance class (and we can probably assume that these are in the right format/type due to the schema, right?)

Comment on lines +1451 to +1454
if prometheus_port and all(
port.container_port != prometheus_port for port in ports
):
ports.append(V1ContainerPort(container_port=prometheus_port))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if prometheus_port and all(
port.container_port != prometheus_port for port in ports
):
ports.append(V1ContainerPort(container_port=prometheus_port))
if prometheus_port and prometheus_port != self.get_container_port():
ports.append(V1ContainerPort(container_port=prometheus_port))

should be the same since the only other port we're adding a V1ContainerPort for is the app one, right?

try:
with open(file_path) as f:
svc_namespaces = yaml.safe_load(f)
if not isinstance(svc_namespaces, Mapping):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, we can probably skip the isinstance checks - we should be guaranteed to see valid smartstack.yaml files since there's a schema

return f"paastasvc-{sanitise_kubernetes_name(service)}"


def is_external_routing_enabled(config: Mapping) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, i guess we don't have a typed dict for smartstack configs - maybe next year (:p)!

Comment on lines +367 to +376
patch_kwargs = dict(
name=service.metadata.name,
namespace=kube_namespace,
body=serialized,
field_manager=FIELD_MANAGER,
force=True,
content_type="application/apply-patch+yaml",
)
if dry_run:
patch_kwargs["dry_run"] = "All"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it'd be nice to keep using client-side apply just to avoid having to manage these kwargs dicts :p

(paasta is the only thing managing these, so SSA/client-side shouldn't be particularly different)

it is somewhat nice to have the server do validation for dry-runs, but not 100% sure how much more safety that buys us since these are pretty simple objects

try:
kube_client.core.create_namespaced_service(**create_kwargs)
created = True
if not dry_run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we don't need this since we're setting dry_run to true, right?


create_kwargs = dict(namespace=kube_namespace, body=service)
if dry_run:
create_kwargs["dry_run"] = "All"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we wanna SSA this, we're missing the other kwargs here, right?

Comment on lines +395 to +408
if created and not dry_run:
log.warning(
f"Rolling back Service {kube_namespace}/{service.metadata.name} after apply error"
)
try:
kube_client.core.delete_namespaced_service(
name=service.metadata.name,
namespace=kube_namespace,
)
except Exception:
log.exception(
f"Failed to rollback Service {kube_namespace}/{service.metadata.name} after apply error"
)
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I wonder if we should just create the Service as we'd want it to look over Create+Patch?

(i think that's the pattern we take in skj and whatnot - 3 different paths for create vs modify vs delete)

svc = build_namespace_service(kube_namespace, namespace)
try:
server_side_apply_service(kube_client, kube_namespace, svc, dry_run)
yield (True, f"Reconciled Service {kube_namespace}/{svc.metadata.name}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we're using an iterator here to avoid having to maintain a list of results to return at the end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants