Skip to content

Conversation

@jopemachine
Copy link
Member

@jopemachine jopemachine commented Nov 27, 2025

resolves #6867 (BA-3103)

Checklist: (if applicable)

  • Mention to the original issue

📚 Documentation preview 📚: https://sorna--6992.org.readthedocs.build/en/6992/


📚 Documentation preview 📚: https://sorna-ko--6992.org.readthedocs.build/ko/6992/

@jopemachine jopemachine changed the title feat: Add ScalingGroup strawberry-base GQL type feat(BA-3103): Add ScalingGroup strawberry-base GQL type Nov 27, 2025
@github-actions github-actions bot added size:L 100~500 LoC comp:manager Related to Manager component labels Nov 27, 2025
@jopemachine jopemachine changed the title feat(BA-3103): Add ScalingGroup strawberry-base GQL type feat(BA-3103): Add ScalingGroup strawberry-base GQL type Nov 27, 2025
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Nov 27, 2025
@github-actions github-actions bot added the area:docs Documentations label Nov 28, 2025
@jopemachine jopemachine marked this pull request as ready for review November 28, 2025 04:27
Co-authored-by: octodog <mu001@lablup.com>
Comment on lines +36 to +37
@strawberry.field(description="Added in 25.18.0. List scaling groups")
async def scaling_groups(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this scaling_groups to be used as a whole? Does it overlap with the existing query?

Comment on lines +86 to +107
description="Session types that can be scheduled in this scaling group. Valid values: 'interactive' , 'batch', 'inference'. Requests for unlisted types are rejected."
)
pending_timeout: float = strawberry.field(
description="Maximum time in seconds a session can wait in PENDING state before automatic cancellation. Zero means no timeout. Used to prevent indefinite resource waiting when no agents are available."
)
config: JSON = strawberry.field(
description="Scheduler-specific configuration options. Contents depend on the scheduler implementation (fifo/lifo/drf). Used for advanced scheduling behavior customization."
)
agent_selection_strategy: str = strawberry.field(
description="Algorithm for selecting target agents when scheduling sessions. 'dispersed' spreads sessions across available agents, 'concentrated' packs sessions onto fewer agents, 'roundrobin' cycles through agents sequentially."
)
agent_selector_config: JSON = strawberry.field(
description="Configuration for the agent selection strategy. Structure varies by strategy - for example, concentrated strategy may specify endpoint spreading rules."
)
enforce_spreading_endpoint_replica: bool = strawberry.field(
description="When true with concentrated strategy, forces inference service replicas to be distributed across different agents instead of co-locating on same agent. Improves fault tolerance for model serving."
)
allow_fractional_resource_fragmentation: bool = strawberry.field(
description="Whether agents accept session requests that allocate fractional resources (e.g., 0.5 GPU) causing resource fragmentation. When false, agents reject sessions that would prevent future efficient resource allocation."
)
route_cleanup_target_statuses: list[str] = strawberry.field(
description="List of route health statuses that trigger automatic cleanup of service routes. Valid values: 'healthy', 'unhealthy', 'degraded'. Default: ['unhealthy']."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use multi line + textwrap for the description.

Comment on lines +101 to +102
description="When true with concentrated strategy, forces inference service replicas to be distributed across different agents instead of co-locating on same agent. Improves fault tolerance for model serving."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to first explain the description of the value, and then explain what behavior occurs when it is set to true.

Comment on lines +150 to +181
status=ScalingGroupStatus(
is_active=data.status.is_active,
is_public=data.status.is_public,
),
metadata=ScalingGroupMetadata(
description=data.metadata.description,
created_at=data.metadata.created_at,
),
wsproxy=ScalingGroupNetworkConfig(
wsproxy_addr=data.wsproxy.wsproxy_addr,
wsproxy_api_token=data.wsproxy.wsproxy_api_token,
use_host_network=data.wsproxy.use_host_network,
),
driver=ScalingGroupDriverConfig(
name=data.driver.name,
options=data.driver.options,
),
scheduler=ScalingGroupSchedulerConfig(
name=data.scheduler.name,
options=ScalingGroupSchedulerOptions(
allowed_session_types=[
st.value for st in data.scheduler.options.allowed_session_types
],
pending_timeout=data.scheduler.options.pending_timeout.total_seconds(),
config=data.scheduler.options.config,
agent_selection_strategy=data.scheduler.options.agent_selection_strategy.value,
agent_selector_config=data.scheduler.options.agent_selector_config,
enforce_spreading_endpoint_replica=data.scheduler.options.enforce_spreading_endpoint_replica,
allow_fractional_resource_fragmentation=data.scheduler.options.allow_fractional_resource_fragmentation,
route_cleanup_target_statuses=data.scheduler.options.route_cleanup_target_statuses,
),
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to make from_dataclass for each type.

Comment on lines +191 to +194
DESCRIPTION = "description"
CREATED_AT = "created_at"
IS_ACTIVE = "is_active"
IS_PUBLIC = "is_public"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description order doesn't seem strictly necessary.

Comment on lines +98 to +103
@staticmethod
def description(ascending: bool = True) -> QueryOrder:
if ascending:
return ScalingGroupRow.description.asc()
else:
return ScalingGroupRow.description.desc()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good that the order of the description is not provided from the backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants