self_hosted_inference_core is the service-runtime kernel for local and
self-hosted inference backends.
It owns the runtime concerns that sit between raw process placement and backend-specific boot logic:
- backend registration
- runtime instance registration
- startup-kind handling
- readiness orchestration
- health monitoring
- lease and reuse semantics
- endpoint publication
- backend-to-consumer compatibility calculation
It does not own transport mechanics or client protocol execution.
external_runtime_transport owns process placement and IO lifecycle.
req_llm remains the data-plane client after an endpoint has been resolved.
external_runtime_transport
-> self_hosted_inference_core
-> concrete backend package or attach adapter
-> req_llm consumers through EndpointDescriptor
That split keeps service lifecycle in the runtime stack and keeps request execution in the client layer.
Two backend shapes are now proved:
- built-in attach adapter:
SelfHostedInferenceCore.Ollama - concrete spawned backend package:
llama_cpp_ex
SelfHostedInferenceCore.Ollama proves the first truthful
management_mode: :externally_managed path.
It attaches to an already running Ollama daemon, owns readiness and health
interpretation above the transport seam, and publishes the same northbound
endpoint contract used by the spawned path.
llama_cpp_ex plugs into the kernel by implementing
SelfHostedInferenceCore.Backend and owns:
llama-serverboot-spec normalization- readiness and health probes
- stop semantics for a spawned service
- backend manifest publication
- endpoint descriptor production
That keeps the kernel generic while proving both ownership shapes on real backends.
self_hosted_inference_core treats startup topology as an explicit part of the
contract:
:spawned- BEAM-managed service lifecycle
- maps to
management_mode: :jido_managed
:attach_existing_service- externally managed daemon lifecycle
- maps to
management_mode: :externally_managed
Both paths use the same northbound endpoint and lease contracts. The kernel validates that backends keep startup kind, management mode, and transport ownership truthful. It also rejects execution surfaces that are not declared in the backend manifest.
Add the package to your dependency list:
def deps do
[
{:self_hosted_inference_core, "~> 0.1.0"}
]
endConcrete backends register themselves against the kernel by implementing
SelfHostedInferenceCore.Backend.
See guides/backend_packages.md for how the
kernel expects concrete backend packages to attach.
See guides/ollama_attach.md for the built-in
attached-local backend.
Define a backend or attach adapter, register it, and ensure a northbound endpoint for a request:
alias SelfHostedInferenceCore.ConsumerManifest
:ok = SelfHostedInferenceCore.register_backend(MyBackend)
consumer =
ConsumerManifest.new!(
consumer: :jido_integration_req_llm,
accepted_runtime_kinds: [:service],
accepted_management_modes: [:jido_managed, :externally_managed],
accepted_protocols: [:openai_chat_completions],
required_capabilities: %{streaming?: true},
optional_capabilities: %{},
constraints: %{},
metadata: %{adapter: :req_llm}
)
request = %{
request_id: "req-123",
target_preference: %{
target_class: "self_hosted_endpoint",
backend: "my_backend",
backend_options: %{model_identity: "demo-model"}
}
}
context = %{
run_id: "run-123",
attempt_id: "run-123:1",
boundary_ref: "boundary-123",
observability: %{trace_id: "trace-123"}
}
{:ok, endpoint, compatibility} =
SelfHostedInferenceCore.ensure_endpoint(
request,
consumer,
context,
owner_ref: "run-123",
ttl_ms: 30_000
)
endpoint.base_url
endpoint.lease_ref
compatibility.reasonSee examples/README.md for runnable demos covering both
:spawned and :attach_existing_service.
HexDocs includes:
- architecture and stack-boundary guidance
- built-in
ollamaattach guidance - concrete backend package guidance
- the northbound endpoint contract used by
jido_integration - runtime registry and lease semantics
- startup-kind guidance for spawned and attached services
- runnable examples
Released under the MIT License. See LICENSE.