std.cluster
std.cluster
Section titled “std.cluster”std.cluster currently exposes the local supervised-actor runtime for the :cluster profile tracer bullet.
This page documents what exists now. Local grains have a source-level activation
shell, a local activation registry that enforces one live writer per durable
identity, a local namespace lookup layer, and explicit GrainStore-backed
lifecycle callbacks. Compiler-generated state serializers, placement,
membership, migration, and remote transport remain future :cluster work.
The current facade is local-only:
LocalActorSystemowns one cluster-budgetNurseryand oneSupervisor.- Actors run as nursery tasks.
- Supervisor strategies are
one_for_one,one_for_all, andrest_for_one. - Restart policies are
permanent,transient, andtemporary. - Restart budgets and pledge-violation restart opt-in are exposed.
- Child status snapshots expose lifecycle, actor id, task id, task state, last exit reason, and restart count.
- Actor tombstones can be counted, classified for repeated deterministic patterns, and mirrored to a caller-provided sink.
- Grain declarations with durable identity syntax lower through the same local supervised activation shell as actors.
- Local grain lookup/start maps
(grain_type, grain_id)to one stable local actor reference, so duplicate activation attempts reuse the existing live activation instead of creating a second mutator. - Local namespace lookup maps
(grain_type, namespace)to an internal durable grain id, then routes duplicate lookups through the same single-writer activation registry. - Persistent local grain start can invoke caller-provided load/store callbacks
backed by
GrainStoreBytesafter setup, after message/timeout boundaries, and before teardown.
Janus actor start
Section titled “Janus actor start”Define a message protocol with message. The declaration uses tagged
variants:
message CounterMsg { Tick, Stop,}Attach the protocol to an actor with actor Name(msg: Msg). The payload
binding name is part of the header; the current generated handler still
receives the raw i64 tag as __msg.
@mailbox(capacity: 4)actor Counter(msg: CounterMsg) do var count: u64 = 0
receive do count += __msg endendFor a source-level Janus actor, the compiler emits the supervised start
wrappers:
ActorName_start_supervised(system: u64, slot: u64, policy: u32) -> u64ActorName_start_supervised_ref(system: u64, slot: u64, policy: u32) -> u64ActorName_start_supervised returns the transient ActorId.
ActorName_start_supervised_ref starts the actor and returns a stable
local actor reference for the supervised (system, slot) identity. Use
the _ref form for production send and observation paths that should
survive supervisor restarts.
{.profile: cluster.}
use std.cluster.local as cluster
message CounterMsg { Tick, Stop,}
@mailbox(capacity: 4)actor Counter(msg: CounterMsg) do var count: u64 = 0
receive do count += __msg endend
pub func main() -> i32 do let system = cluster.local_new( 1 as u64, cluster.STRATEGY_ONE_FOR_ONE, 1 as u64, ) if system == 0 as u64 do return 1 end
let counter = Counter_start_supervised_ref( system, 0 as u64, cluster.POLICY_PERMANENT, ) if counter == 0 as u64 do return 2 end
if cluster.local_ref_mailbox_capacity(counter) != 4 as i64 do return 3 end if cluster.local_ref_try_send(counter, 1 as i64) != 1 as i32 do return 4 end if cluster.local_shutdown(system) != 1 as i32 do return 5 end if cluster.local_destroy(system) != 1 as i32 do return 6 end return 0endThe wrapper hides the setup/handler/destroy runtime-entry plumbing. Public
Janus APIs accept typed callables or generated actor/grain starters; callable
addresses are not ordinary u64 values on the language surface.
Local grain activation registry
Section titled “Local grain activation registry”The compiler accepts the final local-persistent grain header shape and emits the same local supervised start wrapper used by actors:
@persist(via: GrainStoreBytes)@lifecycle(activation: .lazy)grain User(id: u64, msg: UserMsg) do var count: u64 = 0
receive do UserMsg.Ping => do count += 1 end, UserMsg.Stop => do return 0 end, endendFor the compiler slice, User_start_supervised(system, slot, policy) remains an
activation shell over the local actor runtime. For the runtime registry slice,
use std.cluster.local to locate or start a grain activation by durable numeric
identity:
let user_ref = cluster.local_grain_lookup_or_start( system, 100 as u64, // grain type id 42 as u64, // durable grain id 0 as u64, // local supervisor slot cluster.POLICY_PERMANENT, 4 as u64, // mailbox capacity user_setup, user_handler, user_destroy,)If another call uses the same (grain_type, grain_id), the runtime returns the
same stable local reference while the activation is live. This pins the first
grain runtime invariant: one durable identity has one active local writer.
The local namespace layer resolves human-readable namespace keys to internal durable grain ids before entering the same activation registry:
let user_ref = cluster.local_grain_lookup_or_start_namespace( system, 100 as u64, // grain type id "users/alice", // local namespace key 0 as u64, // local supervisor slot cluster.POLICY_PERMANENT, 4 as u64, // mailbox capacity user_setup, user_handler, user_destroy,)local_grain_namespace_lookup returns the mapped internal id, or 0 when the
namespace is unbound. local_grain_lookup_or_start_namespace derives and stores
an internal id on first lookup, then returns the same live activation ref for
duplicate namespace lookups. local_grain_namespace_bind can bind aliases to an
existing id; rebinding an existing namespace to a different id is rejected.
For local persistence, use the persistent lookup/start variant and pass lifecycle callbacks:
let user_ref = cluster.local_grain_lookup_or_start_persistent( system, 100 as u64, 42 as u64, 0 as u64, cluster.POLICY_PERMANENT, 4 as u64, user_setup, user_handler, user_destroy, store_ctx as u64, load, store,)The load/store callbacks use this shape:
pub func load(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do // Return >= 0 for a valid cold miss or restore, negative for fatal load.end
pub func store(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do // Return 1 when durable state was committed, 0 on failure.endctx is the caller-provided store context, commonly a pointer to a
GrainStoreBytes facade. The runtime calls load after setup returns a state
pointer, calls store after message and timeout handlers, and calls store
again before teardown. Store failure turns the handler boundary into a stop so
the activation does not continue pretending volatile mutation was committed.
Use local_grain_persistence_load_failures(system) and
local_grain_persistence_store_failures(system) to inspect persistence callback
failures observed by the local runtime. The counters are scoped to the local
actor system handle and increment only when a user-provided load callback returns
a negative value or a store callback returns anything other than 1.
The current registry and namespace layer are still local and partly in-memory. They do not yet provide compiler-generated GrainStore serializers, passivation, migration, remote routing, cross-node placement, or durable namespace persistence. Those are separate runtime layers.
The local grain registry helpers are:
cluster.local_grain_lookup_or_start(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy) -> u64cluster.local_grain_lookup_or_start_persistent(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64cluster.local_grain_ref_try_send(grain_ref, msg) -> i32cluster.local_grain_active_count(system) -> u64cluster.local_grain_persistence_load_failures(system) -> u64cluster.local_grain_persistence_store_failures(system) -> u64cluster.local_grain_namespace_lookup(system, grain_type, namespace) -> u64cluster.local_grain_namespace_bind(system, grain_type, namespace, grain_id) -> i32cluster.local_grain_lookup_or_start_namespace(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy) -> u64cluster.local_grain_lookup_or_start_namespace_persistent(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64Stable local actor references are scalar handles. They encode the local system
handle, child slot, and slot generation, not the runtime ActorId, so a
permanent or transient child keeps the same reference after restart. If you stop
a child and reuse the slot for a different child, the old reference becomes
invalid instead of aliasing the replacement. The current ref helpers are:
cluster.local_actor_ref(system, slot) -> u64cluster.local_ref_try_send(actor_ref, msg) -> i32cluster.local_ref_child_actor_id(actor_ref) -> i32cluster.local_ref_child_lifecycle(actor_ref) -> i32cluster.local_ref_child_task_state(actor_ref) -> i32cluster.local_ref_child_last_exit(actor_ref) -> i32cluster.local_ref_mailbox_len(actor_ref) -> i64cluster.local_ref_mailbox_capacity(actor_ref) -> i64cluster.local_ref_stop_child(actor_ref, reason) -> i32Capability-gated callers use the same reference shape with explicit
ClusterLocalCap authority:
cluster.local_actor_ref_cap(cap, system, slot) -> u64cluster.local_ref_try_send_cap(cap, actor_ref, msg) -> i32cluster.local_ref_child_actor_id_cap(cap, actor_ref) -> i32cluster.local_ref_child_lifecycle_cap(cap, actor_ref) -> i32cluster.local_ref_child_task_state_cap(cap, actor_ref) -> i32cluster.local_ref_child_last_exit_cap(cap, actor_ref) -> i32cluster.local_ref_mailbox_len_cap(cap, actor_ref) -> i64cluster.local_ref_mailbox_capacity_cap(cap, actor_ref) -> i64cluster.local_ref_stop_child_cap(cap, actor_ref, reason) -> i32Use ActorRef[Msg] for compile-time message protocol checks on direct
spawned actors. Use the scalar local actor reference above for the
supervised local bridge path.
Inside receive, you can either write normal statements against __msg or
write bare match arms. Bare arms desugar to match __msg { ... }:
receive do 0 => do count += 1 end, 1 => do return 0 end, else => do count = count end,endFor typed message protocols, receive arms can match named variants, destructure payload fields, guard on destructured bindings, and include a timeout arm:
message CounterMsg { Tick, Set { value: u64 }, Stop,}
receive do CounterMsg.Tick => do count += 1 end, CounterMsg.Set { value } when value >= 0 as u64 => do count += value end, CounterMsg.Stop => do return 0 end, else => do count = count end, after 0 => do count = count end,endThe shorthand { value } binds the payload field named value into the arm
scope. Message payload fields must be SBI-conformant; pointer-typed fields are
rejected at declaration time with E2530.
For compiler-generated supervised actors, an after N => ... arm is wired into
the local runtime. The compiler emits an ActorName_timeout(actor) helper and
the generated ActorName_start_supervised* wrappers register it with the
mailbox timeout. Delivered messages still call ActorName_handler(actor, msg);
an empty mailbox at the timeout boundary calls ActorName_timeout(actor).
Direct spawned actors can use typed actor references:
pub func send_tick(ref: ActorRef[CounterMsg]) -> i32 do ref.send(CounterMsg.Tick) return 0end
pub func spawn_counter() -> ActorRef[CounterMsg] do return spawn Counter()endActorRef[Msg] is a compile-time protocol witness over the current actor
handle ABI. The compiler checks direct ref.send(Msg.UnitVariant) calls,
typed local bindings, and direct return spawn Actor() expressions. Unit
variants lower to their i64 tag. Payload-carrying variants are now
supported: fields transfer through boxed slot arrays, and receive arms can
destructure them with Msg.Variant { field } patterns. All message fields
must be SBI-conformant (owned, by-value, no pointers) — the compiler
rejects non-conformant declarations with E2530.
Sendability
Section titled “Sendability”SPEC-029 sendability is enforced before actor payload delivery ships. For proven actor, channel, and mailbox send boundaries:
ref Tpayloads are rejected with E2801.iso Tpayloads are accepted and the binding is consumed.- Reading a consumed
isobinding emits E2802. val Tandtag Tpayloads are sendable.
This is a type check, not a serialization trait check. Janus does not require
a Serialize trait for actor messages. Wire-ready message payloads must use
SBI-compatible layout when the distributed transport path lands.
Explicit child stop
Section titled “Explicit child stop”Use local_stop_child when a caller wants to stop a live child without
applying its restart policy:
let stopped = cluster.local_stop_child( system, 0 as u64, cluster.STOP_REASON_SHUTDOWN,)Shutdown and normal stop reasons do not create tombstones. Abnormal,
killed, and pledge-violation stop reasons do create tombstones, but
still do not restart the child. local_handle_crash and
local_handle_exit remain the restart-policy paths for simulated or
observed actor exits.
Mailbox backpressure
Section titled “Mailbox backpressure”The local actor mailbox is bounded. Actors without @mailbox use the
runtime channel default: one pending handoff slot. The public send surface
is non-blocking:
let sent = cluster.local_try_send(system, 0 as u64, 42 as i64)let sent_ref = cluster.local_ref_try_send(actor_ref, 42 as i64)Return codes are stable for the current tracer bullet:
1: the message was accepted.0: the child slot is empty or the mailbox is full.-1: the mailbox channel is closed.
Use @mailbox(capacity: N) on a compiler-generated actor to set the
supervised actor mailbox capacity. The compiler also uses the same value for
direct spawn Actor() mailboxes. The overflow argument is parsed for the
source surface, but the current runtime path enforces capacity only.
Production callers should treat 0 as backpressure or missing-child
rejection and retry, drop, or escalate according to their actor protocol.
Mailbox pressure is observable through scalar status accessors:
let pending = cluster.local_child_mailbox_len(system, 0 as u64)let slots = cluster.local_child_mailbox_capacity(system, 0 as u64)The default reports slots == 1. An actor declared with
@mailbox(capacity: 4) reports slots == 4. Both functions return -1
when the slot has no live child.
Janus Status Accessors
Section titled “Janus Status Accessors”The Janus facade exposes local supervisor and child status without exposing actor state:
let supervisor_state = cluster.local_supervisor_state(system)let lifecycle = cluster.local_child_lifecycle(system, 0 as u64)let task_state = cluster.local_child_task_state(system, 0 as u64)let last_exit = cluster.local_child_last_exit(system, 0 as u64)local_supervisor_state returns:
SUPERVISOR_STATE_RUNNINGSUPERVISOR_STATE_STOPPEDSUPERVISOR_STATE_FAILED-1for an invalid handle
local_child_lifecycle returns:
CHILD_LIFECYCLE_UNCONFIGUREDCHILD_LIFECYCLE_CONFIGUREDCHILD_LIFECYCLE_RUNNINGCHILD_LIFECYCLE_STOPPEDCHILD_LIFECYCLE_FAILED-1for an invalid handle or slot
local_child_task_state returns TASK_STATE_READY,
TASK_STATE_RUNNING, TASK_STATE_BLOCKED,
TASK_STATE_BUDGET_EXHAUSTED, TASK_STATE_COMPLETED,
TASK_STATE_CANCELLED, or -1 when no live task is present.
local_child_last_exit returns the same STOP_REASON_* codes used
by local_handle_exit, or -1 when no exit is recorded.
Every status accessor has a _cap form that consumes
ClusterLocalCap. These accessors report lifecycle and pressure only;
they do not expose actor-local variables or grain-owned state.
LocalActorSystem
Section titled “LocalActorSystem”LocalActorSystem is the ergonomic root for the local tracer bullet. It keeps callers on the public std.cluster path instead of reaching into runtime internals.
const cluster = @import("std_cluster");
var system = try cluster.LocalActorSystem.init( allocator, 1, // nursery id "root", // supervisor id .one_for_one, 2, // child slots);defer system.deinit();Starting Children
Section titled “Starting Children”Children are started from ChildSpec values. A child start function receives the actor-system nursery and the allocator owned by the supervisor.
fn startWorker(nursery: *cluster.Nursery, allocator: std.mem.Allocator) !cluster.SupervisedChild { const actor = try allocator.create(cluster.Actor); errdefer allocator.destroy(actor);
actor.* = try cluster.Actor.init(allocator, 1, 1); errdefer actor.deinit();
const task = cluster.spawn(nursery, actor, workerHandler) orelse return error.ActorSpawnRejected; return .{ .actor = actor, .task = task };}
_ = try system.startChild(0, .{ .id = "worker", .start_fn = startWorker, .restart = .permanent,});You can also configure children first and start them later:
try system.configureChild(0, .{ .id = "worker", .start_fn = startWorker, .restart = .permanent,});
const started = try system.startConfiguredChildren();Handling Exits
Section titled “Handling Exits”Use handleCrash for ordinary abnormal actor failure:
try system.handleCrash(0);Use handleExit when the caller knows the exact stop reason:
try system.handleExit(0, .pledge_violated);The Janus facade exposes the same path with stable STOP_REASON_*
codes:
if cluster.local_handle_exit( system, 0 as u64, cluster.STOP_REASON_PLEDGE_VIOLATED,) != 1 as i32 do return 1 endUse handleExitAt for deterministic restart-window tests or runtime loops that already have a timestamp:
try system.handleExitAt(0, .abnormal, 100);const status = system.statusAt(100);Actor Tombstones
Section titled “Actor Tombstones”Abnormal terminal exits now produce actor tombstones. Normal exits and shutdown exits are intentionally skipped; tombstones are for failure classes that may need replay, audit, or repair.
The local runtime keeps the existing bounded in-memory tombstone index and can also mirror each tombstone to a caller-provided sink:
The current low-level sink hook is a raw-address bridge API:
local_set_tombstone_sink_addr(system, ctx_addr, append_addr). Treat it as an
internal bridge surface, not a user-facing Janus callback API. User .jan
documentation must not teach @intFromPtr or raw function-address conversion.
The public typed callback surface belongs to the std.exec/cluster cleanup
work.
The callback receives an opaque context pointer and a callback-scoped
record pointer. Copy or persist the record during the callback; do not
retain record_raw.
Sink counters are exposed for monitoring:
let stored = cluster.local_tombstone_sink_appends(system)let failed = cluster.local_tombstone_sink_failures(system)Stable stop-reason codes are available as STOP_REASON_NORMAL,
STOP_REASON_SHUTDOWN, STOP_REASON_ABNORMAL, STOP_REASON_KILLED,
STOP_REASON_PLEDGE_VIOLATED, and STOP_REASON_MIGRATION_ABORTED.
Tombstone Classification
Section titled “Tombstone Classification”The supervisor hot index can classify the latest tombstone against prior tombstones with the same deterministic pattern: child slot, spec id, stop reason, code version, and input digest. Janus exposes scalar accessors for the current local runtime:
let matches = cluster.local_tombstone_classify_match_count( system, now_seconds, 3 as u32, 60 as i64,)
let deadly = cluster.local_tombstone_classify_deadly( system, now_seconds, 3 as u32, 60 as i64,)
let oldest = cluster.local_tombstone_classify_oldest_sequence( system, now_seconds, 3 as u32, 60 as i64,)matches is the number of hot-index tombstones matching the latest
pattern inside the window. deadly returns 1 when matches reaches the
threshold. oldest returns the oldest matching tombstone sequence, or 0
when no latest tombstone exists. Each function also has a _cap form that
consumes ClusterLocalCap.
Tombstones To STL
Section titled “Tombstones To STL”std.cluster.tombstones converts callback records into canonical STL
events. The adapter keeps cluster supervision and STL storage separate:
the sink copies scalar tombstone fields, builds an ActorTombstone, and
appends through an std.stl.lsm_store.LSMStore.
use std.cluster.local as clusteruse std.cluster.tombstones as tombstonesuse std.db.lsm as lsmuse std.stl.lsm_store as lsm_storeuse std.stl.store as store
pub func tombstone_sink(ctx: u64, record_raw: u64) -> i32 do let gs = as[*lsm.GrainStoreBytes](ctx) var stl = lsm_store.make_store(gs)
var t = tombstones.zero() t.sequence = cluster.tombstone_sequence(record_raw) t.child = cluster.tombstone_child(record_raw) t.reason = cluster.tombstone_reason(record_raw) t.attempt_count = cluster.tombstone_attempt_count(record_raw) t.timestamp_seconds = cluster.tombstone_timestamp_seconds(record_raw)
if tombstones.append_lsm(&stl, &t) != store.STORE_OK do return 0 end return 1endThe sink context should point at the borrowed GrainStoreBytes. The
callback creates a short-lived LSMStore wrapper over that same store;
fresh wrappers can rescan LSM truth later for count, rank lookup, and
flush.
Task Completion Routing
Section titled “Task Completion Routing”The local actor system can route a completed nursery task back to the supervised child slot:
const task = system.childTaskAt(1) orelse return error.MissingTask;task.markCompleted(5);
const restarted_idx = try system.handleTaskCompleteByTask(task);Stale task handles are rejected. This matters after a restart, because the old task pointer must not be allowed to affect the replacement child.
Restart Controls
Section titled “Restart Controls”Restart budgets are opt-in:
system.setRestartLimit(2, 60);From Janus:
_ = cluster.local_set_restart_limit(system, 2 as u32, 60 as i64)The limit is counted per restart window. When the budget is exhausted, the supervisor moves to failed, records the failed child and reason, and stops remaining active children according to the implemented supervisor failure cleanup.
Janus callers can test exhaustion through:
let exhausted = cluster.local_restart_limit_exhausted(system)Pledge violations do not restart by default. This is intentional because pledge failure is a capability boundary event, not an ordinary crash. Local systems can explicitly opt in:
system.setRestartPledgeViolations(true);From Janus:
_ = cluster.local_set_restart_pledge_violations(system, 1 as u32)Lifecycle
Section titled “Lifecycle”Use stopChild, stopChildren, or shutdown for explicit lifecycle control:
try system.stopChild(0, .shutdown);_ = try system.stopChildren(.killed);system.shutdown();shutdown stops active children and moves the supervisor to stopped.
Status Inspection
Section titled “Status Inspection”The facade exposes supervisor and child snapshots:
const supervisor_status = system.status();const child_status = system.childStatus(0);const failure = system.failure();SupervisorStatus includes:
- strategy and state
- slot count
- configured, active, stopped, and failed child counts
- total restarts
- restart exhaustion metadata
- restart limit and remaining restarts
ChildStatus includes:
- lifecycle
- configured spec id and restart policy
- actor id and task id when running
- task state when available
- last exit reason
- restart count
Current Limits
Section titled “Current Limits”- Local runtime only.
- No grain API.
- No placement, membership, gossip, or remote send.
- No automatic actor registry integration.
- No hot reload.
- No persistence for actor state. Actor tombstones can be persisted to STL; live actor state replay remains future work.
- Slot type is
u64. Heterogeneous typed state and non-u64 payload fields remain future work.
The current goal is a correct local supervised-actor tracer bullet. Distributed :cluster features build on this surface later.