std.cluster

std.cluster currently exposes the local supervised-actor runtime for the :cluster profile tracer bullet.

This page documents what exists now. Local grains have a source-level activation shell, lifecycle hook syntax, deactivation-policy metadata, a local activation registry that enforces one live writer per durable identity, a local namespace lookup layer, and explicit GrainStore-backed lifecycle callbacks. Source on_activate/on_deactivate hooks execute over the current local scalar grain state slot. Source supervisor ... child ... end declarations lower to local Name_start_link(node_id) helpers for actor children. Persistent source grains that import std.cluster.persist as persist also get the narrow generated Name_lookup_or_start_persistent_state0_u64(...) helper for slot-0 u64 state over GrainStoreBytes. Local idle passivation is executable through explicit activity touch and deterministic sweep calls. Local namespace mappings can now be persisted in GrainStoreBytes and restored into a fresh local actor system before activation. std.cluster.persist also exposes explicit arbitrary-slot u64 helpers for nonzero scalar state slots, and the compiler can generate all-slot scalar persistence helpers for numeric and namespace grain starts. Heterogeneous typed state serializers, non-u64 state slots, schema evolution, placement, membership, migration, distributed namespace synchronization, distributed supervision aggregation, and remote transport remain future :cluster work.

Scope

The current facade is local-only:

LocalActorSystem owns one cluster-budget Nursery and one Supervisor.
Actors run as nursery tasks.
Supervisor strategies are one_for_one, one_for_all, and rest_for_one.
Restart policies are permanent, transient, and temporary.
Restart budgets and pledge-violation restart opt-in are exposed.
Child status snapshots expose lifecycle, actor id, task id, task state, last exit reason, and restart count.
Actor tombstones can be counted, inspected as bounded latest-record scalar metadata, classified for repeated deterministic patterns, mirrored to a caller-provided sink, and used to drive explicit local quarantine.
Grain declarations with durable identity syntax lower through the same local supervised activation shell as actors.
Grain lifecycle metadata accepts @lifecycle(activation: .lazy, deactivation: .idle_timeout(ms)).
Grain bodies accept on_activate(stored: T) -> T do ... end and on_deactivate(state: T) -> T do ... end as the stable source contract.
Local grain lookup/start maps (grain_type, grain_id) to one stable local actor reference, so duplicate activation attempts reuse the existing live activation instead of creating a second mutator.
Local namespace lookup maps (grain_type, namespace) to an internal durable grain id, then routes duplicate lookups through the same single-writer activation registry.
std.cluster.grainstore exposes explicit local namespace persistence helpers: bind_local_namespace_u64(...) persists and mirrors a binding into the local runtime, and restore_local_namespace_u64(...) restores a persisted binding into a fresh local actor system before activation.
Persistent local grain start can invoke caller-provided load/store callbacks backed by GrainStoreBytes after setup, after message/timeout boundaries, and before teardown.
Persistent source grains that import std.cluster.persist as persist can use generated Name_lookup_or_start_persistent_state0_u64(system, grain_id, slot, policy, ctx) helpers for the current scalar slot-0 u64 state runtime. The helper wires canonical std.cluster.persist load/store callbacks and deterministic (grain_type, grain_id, slot) GrainStoreBytes keys.
The compiler also emits Name_lookup_or_start_persistent_slots_u64(system, grain_id, slot, policy, ctx) for the current scalar u64 state-slot runtime. It wires generated per-grain callbacks that persist every scalar state slot through GrainStoreBytes.
The compiler also emits Name_lookup_or_start_namespace_persistent_slots_u64(system, namespace, slot, policy, ctx) for namespace-addressed local grains. It resolves the local namespace binding first, then uses the same generated all-slot scalar callbacks.
std.cluster.persist also exposes get_slot_u64, put_slot_u64, load_slot_u64, store_slot_u64, load_slots_u64, and store_slots_u64 for explicit scalar state slots and generated all-slot callbacks.
Generated grain lifecycle hooks run inside that boundary: on_activate runs after a persistence load and before the first message; on_deactivate runs before teardown and before the final persistent store.
Local idle passivation is explicit and deterministic: local_grain_touch(ref, now_ms) records an activity boundary, and local_grain_passivate_idle(system, idle_timeout_ms, now_ms, reason) passivates elapsed grains through the same on_deactivate/store boundary.
A source grain with @lifecycle(activation: .lazy, deactivation: .idle_timeout(ms)) also emits Name_passivate_idle(system, now_ms, reason). The helper uses the source timeout literal and keeps the clock sample and stop reason explicit.
Source supervisors create local systems through generated SupervisorName_start_link(node_id) helpers and start declared actor children through generated supervised refs.
Compiler-generated local actor/grain starts forward @arena(max_bytes: N) into the runtime. The current executable boundary enforces that limit for generated scalar u64 state-slot allocation and exposes the configured ceiling through explicit local observation helpers.

Janus actor start

Define a message protocol with message. The declaration uses tagged variants:

message CounterMsg {
    Tick,
    Stop,
}

Attach the protocol to an actor with actor Name(msg: Msg). The payload binding name is part of the header; the current generated handler still receives the raw i64 tag as __msg.

@mailbox(capacity: 4)
actor Counter(msg: CounterMsg) do
    var count: u64 = 0

    receive do
        count += __msg
    end
end

For a source-level Janus actor, the compiler emits the supervised start wrappers:

ActorName_start_supervised(system: u64, slot: u64, policy: u32) -> u64
ActorName_start_supervised_ref(system: u64, slot: u64, policy: u32) -> u64

ActorName_start_supervised returns the transient ActorId. ActorName_start_supervised_ref starts the actor and returns a stable local actor reference for the supervised (system, slot) identity. Use the _ref form for production send and observation paths that should survive supervisor restarts.

{.profile: cluster.}

use std.cluster.local as cluster

message CounterMsg {
    Tick,
    Stop,
}

@mailbox(capacity: 4)
actor Counter(msg: CounterMsg) do
    var count: u64 = 0

    receive do
        count += __msg
    end
end

pub func main() -> i32 do
    let system = cluster.local_new(
        1 as u64,
        cluster.STRATEGY_ONE_FOR_ONE,
        1 as u64,
    )
    if system == 0 as u64 do return 1 end

    let counter = Counter_start_supervised_ref(
        system,
        0 as u64,
        cluster.POLICY_PERMANENT,
    )
    if counter == 0 as u64 do return 2 end

    if cluster.local_ref_mailbox_capacity(counter) != 4 as i64 do
        return 3
    end
    if cluster.local_ref_try_send(counter, 1 as i64) != 1 as i32 do
        return 4
    end
    if cluster.local_shutdown(system) != 1 as i32 do return 5 end
    if cluster.local_destroy(system) != 1 as i32 do return 6 end
    return 0
end

The wrapper hides the setup/handler/destroy runtime-entry plumbing. Public Janus APIs accept typed callables or generated actor/grain starters; callable addresses are not ordinary u64 values on the language surface.

Source supervisor start

supervisor declarations are now executable in the local v1 runtime when their children are source actors with generated supervised-ref helpers:

actor Worker do
    receive do
        __msg
    end
end

actor Scratch do
    receive do
        __msg
    end
end

supervisor Root, strategy: .one_for_one,
    restart_pledge_violations: true do
    child Worker, restart: .permanent
    child Scratch, restart: .temporary
end

pub func main() -> i32 do
    let system = Root_start_link(1 as u64)
    if system == 0 as u64 do return 1 end
    if cluster.local_destroy(system) != 1 as i32 do return 2 end
    return 0
end

The compiler emits:

Root_start_link(node_id: u64) -> u64

The helper lowers deterministically to the local runtime:

cluster_local_new(node_id, strategy, child_count) creates the system.
restart_pledge_violations: true calls cluster_local_set_restart_pledge_violations(system, 1).
Each child Actor, restart: .policy calls Actor_start_supervised_ref(system, slot, policy).
If system creation or any child start fails, the helper returns 0; child start failure also destroys the partially created system.

The current source helper intentionally covers local actor children. Child argument lists, grain identity arguments, distributed supervisor aggregation, and cross-node placement remain runtime layers below the same source doctrine.

Local grain activation registry

The compiler accepts the final local-persistent grain header shape and emits the same local supervised start wrapper used by actors:

@persist(via: GrainStoreBytes)
@lifecycle(activation: .lazy, deactivation: .idle_timeout(300_000))
@requires(cap: [.network])
@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)
@reductions(limit: 128)
@arena(scope: .grain, reset: .on_deactivate)
@observe(mailbox: .summary, state: .none)
@tombstone(digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)
@behaviour(.worker)
grain User(id: u64, msg: UserMsg) do
    var count: u64 = 0

    on_activate(stored: u64) -> u64 do
        return stored
    end

    on_deactivate(state: u64) -> u64 do
        return state
    end

    receive do
        UserMsg.Ping => do
            count += 1
        end,
        UserMsg.Stop => do
            return 0
        end,
    end
end

For the compiler slice, User_start_supervised(system, slot, policy) remains an activation shell over the local actor runtime. Lifecycle hooks now execute for the current local scalar state-slot implementation. Local idle passivation uses explicit caller-supplied milliseconds; no hidden clock or scheduler is implied. When the source declares .idle_timeout(ms), the compiler emits User_passivate_idle(system, now_ms, reason) so callers do not duplicate the timeout literal. The caller still supplies the visible now_ms and reason costs. For the runtime registry slice, use std.cluster.local to locate or start a grain activation by durable numeric identity:

@persist and @lifecycle are now checked as grain source contract during janus build. @persist is valid only on grains and, in the v1 local runtime, must spell via: GrainStoreBytes; missing via, unknown fields, actor use, or future store names fail with E_CLUSTER_PERSIST. @lifecycle is valid only on grains, requires activation: .lazy, and accepts omitted deactivation metadata as the current .never default. If deactivation is present, it must be .never or .idle_timeout(ms) with a positive compile-time millisecond literal. Invalid lifecycle metadata fails with E_CLUSTER_LIFECYCLE.

@requires(cap: [...]) is already enforced by janus build for calls inside the grain body. The compiler maps source symbols to the current Cap* call-graph requirements. .network covers CapNetRead and CapNetWrite; .storage_nvme covers filesystem-style storage requirements; .stdout, .stderr, and .alloc cover their matching runtime powers. If a grain body calls a function requiring CapNetRead without declaring .network, the build fails with E_CAP_MISSING. The annotation itself is closed over the canonical cap: [...] field; missing cap, an empty list, or invented fields such as caps fail with E_CLUSTER_REQUIRES.

func read_socket() requires CapNetRead do
end

@requires(cap: [.storage_nvme])
grain StorageOnly(msg: UserMsg) do
    receive do
        UserMsg.Ping => do
            read_socket() // E_CAP_MISSING during janus build
        end,
        else => do
        end,
    end
end

This compile-time check is separate from runtime placement. NodeManifest matching, migration refusal, and remote routing still belong to the NexusOS cluster runtime.

Memory tags have the same source-contract discipline. The live Phase-B checks are compile-time source rules:

alloc[Local.Shared](...) is rejected inside a grain with E_CLUSTER_MEMTAG. A grain owns its state and mutates it through its protocol; shared mutable local memory is a rival authority path.
alloc[Volatile.Ephemeral](...) is rejected inside a grain unless the grain declares reconstruct(). Ephemeral grain state is allowed only when the source shows how the grain rebuilds it after activation, migration, or passivation boundaries.
@replicate(scope: .wing | .cluster | .swarm, protocol: .pbft) is validated as replication source metadata. scope is required; unknown fields, unsupported scopes, or unsupported protocols fail with E_CLUSTER_REPLICATE. Runtime replication, membership, and consensus execution remain runtime work.

grain BadStore(msg: UserMsg) do
    receive do
        UserMsg.Ping => do
            let slot = alloc[Local.Shared](0 as u64)
            _ = slot
        end,
        else => do
        end,
    end
end

Use Local.Exclusive, Session.Replicated, Session.Consistent, or Volatile.Ephemeral according to the migration contract. For Volatile.Ephemeral, declare reconstruct() next to the receive loop:

grain ScratchStore(msg: UserMsg) do
    reconstruct() do
        // Rebuild dropped caches or scratch state from durable state.
    end

    receive do
        UserMsg.Ping => do
            let scratch = alloc[Volatile.Ephemeral](0 as u64)
            _ = scratch
        end,
        else => do
        end,
    end
end

Full runtime replication/passivation behavior remains below the same source surface.

let user_ref = cluster.local_grain_lookup_or_start(
    system,
    100 as u64,        // grain type id
    42 as u64,         // durable grain id
    0 as u64,          // local supervisor slot
    cluster.POLICY_PERMANENT,
    4 as u64,          // mailbox capacity
    user_setup,
    user_handler,
    user_destroy,
)

If another call uses the same (grain_type, grain_id), the runtime returns the same stable local reference while the activation is live. This pins the first grain runtime invariant: one durable identity has one active local writer.

The local namespace layer resolves human-readable namespace keys to internal durable grain ids before entering the same activation registry:

let user_ref = cluster.local_grain_lookup_or_start_namespace(
    system,
    100 as u64,        // grain type id
    "users/alice",     // local namespace key
    0 as u64,          // local supervisor slot
    cluster.POLICY_PERMANENT,
    4 as u64,          // mailbox capacity
    user_setup,
    user_handler,
    user_destroy,
)

local_grain_namespace_lookup returns the mapped internal id, or 0 when the namespace is unbound. local_grain_lookup_or_start_namespace derives and stores an internal id on first lookup, then returns the same live activation ref for duplicate namespace lookups. local_grain_namespace_bind can bind aliases to an existing id; rebinding an existing namespace to a different id is rejected.

For local persistence, use the persistent lookup/start variant and pass lifecycle callbacks:

let user_ref = cluster.local_grain_lookup_or_start_persistent(
    system,
    100 as u64,
    42 as u64,
    0 as u64,
    cluster.POLICY_PERMANENT,
    4 as u64,
    user_setup,
    user_handler,
    user_destroy,
    store_ctx as u64,
    load,
    store,
)

The load/store callbacks use this shape:

pub func load(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do
    // Return >= 0 for a valid cold miss or restore, negative for fatal load.
end

pub func store(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do
    // Return 1 when durable state was committed, 0 on failure.
end

ctx is the caller-provided store context, commonly a pointer to a GrainStoreBytes facade. The runtime calls load after setup returns a state pointer, calls store after message and timeout handlers, and calls store again before teardown. Store failure turns the handler boundary into a stop so the activation does not continue pretending volatile mutation was committed.

Use local_grain_persistence_load_failures(system) and local_grain_persistence_store_failures(system) to inspect persistence callback failures observed by the local runtime. The counters are scoped to the local actor system handle and increment only when a user-provided load callback returns a negative value or a store callback returns anything other than 1.

For source-declared grains with scalar u64 state slots, the compiler also emits callback-free helpers:

use std.cluster.persist as persist

let user_ref = User_lookup_or_start_persistent_state0_u64(
    system,
    42 as u64,
    0 as u64,
    cluster.POLICY_PERMANENT,
    store_ctx as u64,
)

let full_user_ref = User_lookup_or_start_persistent_slots_u64(
    system,
    42 as u64,
    0 as u64,
    cluster.POLICY_PERMANENT,
    store_ctx as u64,
)

let named_user_ref = User_lookup_or_start_namespace_persistent_slots_u64(
    system,
    "users/alice",
    0 as u64,
    cluster.POLICY_PERMANENT,
    store_ctx as u64,
)

These helpers still expose the cost: the caller passes the persistence context explicitly, and the runtime performs load/store at the same boundaries. The state0 helper preserves the original single-slot convenience path; the numeric and namespace slots helpers persist every generated scalar u64 state slot. The namespace helper is still local: callers must persist or restore namespace bindings explicitly when they need durable names across systems.

If a source grain declares @lifecycle(..., deactivation: .idle_timeout(ms)), the compiler also emits:

User_passivate_idle(system, now_ms, reason) -> u64

This helper lowers to cluster.local_grain_passivate_idle with the source timeout literal. It does not read a clock and does not install a hidden timer; the scheduler or caller remains responsible for choosing when to sweep.

The current registry and namespace layer are still local. They do not yet provide heterogeneous typed GrainStore serializers, non-u64 state slots, typed-state schema evolution, hidden scheduler-owned sweep loops, migration, remote routing, cross-node placement, or distributed namespace synchronization. Those are separate runtime layers.

The local grain registry helpers are:

cluster.local_grain_lookup_or_start(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy) -> u64
cluster.local_grain_lookup_or_start_lifecycle(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, activate, deactivate) -> u64
cluster.local_grain_lookup_or_start_persistent(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64
cluster.local_grain_lookup_or_start_persistent_lifecycle(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, ctx, load, store, activate, deactivate) -> u64
cluster.local_grain_ref_try_send(grain_ref, msg) -> i32
cluster.local_grain_touch(grain_ref, now_ms) -> i32
cluster.local_grain_passivate_idle(system, idle_timeout_ms, now_ms, reason) -> u64
cluster.local_grain_active_count(system) -> u64
cluster.local_grain_persistence_load_failures(system) -> u64
cluster.local_grain_persistence_store_failures(system) -> u64
cluster.local_grain_namespace_lookup(system, grain_type, namespace) -> u64
cluster.local_grain_namespace_bind(system, grain_type, namespace, grain_id) -> i32
cluster.local_grain_lookup_or_start_namespace(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy) -> u64
cluster.local_grain_lookup_or_start_namespace_lifecycle(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, activate, deactivate) -> u64
cluster.local_grain_lookup_or_start_namespace_persistent(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64
cluster.local_grain_lookup_or_start_namespace_persistent_lifecycle(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, ctx, load, store, activate, deactivate) -> u64
cluster.local_arena_max_bytes(system, slot) -> u64

Stable local actor references are scalar handles. They encode the local system handle, child slot, and slot generation, not the runtime ActorId, so a permanent or transient child keeps the same reference after restart. If you stop a child and reuse the slot for a different child, the old reference becomes invalid instead of aliasing the replacement. The current ref helpers are:

cluster.local_actor_ref(system, slot) -> u64
cluster.local_ref_try_send(actor_ref, msg) -> i32
cluster.local_ref_child_actor_id(actor_ref) -> i32
cluster.local_ref_child_lifecycle(actor_ref) -> i32
cluster.local_ref_child_task_state(actor_ref) -> i32
cluster.local_ref_child_last_exit(actor_ref) -> i32
cluster.local_ref_mailbox_len(actor_ref) -> i64
cluster.local_ref_mailbox_capacity(actor_ref) -> i64
cluster.local_ref_arena_max_bytes(actor_ref) -> u64
cluster.local_ref_stop_child(actor_ref, reason) -> i32

Capability-gated callers use the same reference shape with explicit ClusterLocalCap authority:

cluster.local_actor_ref_cap(cap, system, slot) -> u64
cluster.local_ref_try_send_cap(cap, actor_ref, msg) -> i32
cluster.local_ref_child_actor_id_cap(cap, actor_ref) -> i32
cluster.local_ref_child_lifecycle_cap(cap, actor_ref) -> i32
cluster.local_ref_child_task_state_cap(cap, actor_ref) -> i32
cluster.local_ref_child_last_exit_cap(cap, actor_ref) -> i32
cluster.local_ref_mailbox_len_cap(cap, actor_ref) -> i64
cluster.local_ref_mailbox_capacity_cap(cap, actor_ref) -> i64
cluster.local_ref_arena_max_bytes_cap(cap, actor_ref) -> u64
cluster.local_ref_stop_child_cap(cap, actor_ref, reason) -> i32
cluster.local_arena_max_bytes_cap(cap, system, slot) -> u64

Grain @requires is declaration-level metadata; capability-token facade calls remain expression-level authority. Use both when both are true: the grain declares what kind of node/API authority it needs, and a specific runtime call passes the concrete token that authorizes the operation.

Use ActorRef[Msg] for compile-time message protocol checks on direct spawned actors. Use the scalar local actor reference above for the supervised local bridge path. Local GrainRef[Msg] uses the same protocol check and boxed payload send ABI for local grain activations; the test-cluster-grain-payload gate proves payload delivery by resolving a typed Promise[T] from inside the grain receive arm.

Inside receive, you can either write normal statements against __msg or write bare match arms. Bare arms desugar to match __msg { ... }:

receive do
    0 => do
        count += 1
    end,
    1 => do
        return 0
    end,
    else => do
        count = count
    end,
end

For typed message protocols, receive arms can match named variants, destructure payload fields, guard on destructured bindings, and include a timeout arm:

message CounterMsg {
    Tick,
    Set { value: u64 },
    Stop,
}

receive do
    CounterMsg.Tick => do
        count += 1
    end,
    CounterMsg.Set { value } when value >= 0 as u64 => do
        count += value
    end,
    CounterMsg.Stop => do
        return 0
    end,
    else => do
        count = count
    end,
    after 0 => do
        count = count
    end,
end

The shorthand { value } binds the payload field named value into the arm scope. Message payload fields must be SBI-conformant; pointer-typed fields are rejected at declaration time with E2530.

For compiler-generated supervised actors, an after N => ... arm is wired into the local runtime. The compiler emits an ActorName_timeout(actor) helper and the generated ActorName_start_supervised* wrappers register it with the mailbox timeout. Delivered messages still call ActorName_handler(actor, msg); an empty mailbox at the timeout boundary calls ActorName_timeout(actor).

Direct spawned actors can use typed actor references:

pub func send_tick(ref: ActorRef[CounterMsg]) -> i32 do
    ref.send(CounterMsg.Tick)
    return 0
end

pub func spawn_counter() -> ActorRef[CounterMsg] do
    return spawn Counter()
end

ActorRef[Msg] is a compile-time protocol witness over the current actor handle ABI. The compiler checks direct ref.send(Msg.UnitVariant) calls, typed local bindings, and direct return spawn Actor() expressions. Unit variants lower to their i64 tag. Payload-carrying variants are now supported: fields transfer through boxed slot arrays, and receive arms can destructure them with Msg.Variant { field } patterns. All message fields must be SBI-conformant (owned, by-value, no pointers) — the compiler rejects non-conformant declarations with E2530.

Local GrainRef[Msg] follows the same boxed payload ABI for source-level .send(...) calls. The local runtime still activates grains through the node-local actor substrate, but the source witness is grain-shaped and protocol-checked independently from ActorRef[Msg].

Sendability

SPEC-029 sendability is enforced before actor payload delivery ships. For proven actor, channel, and mailbox send boundaries:

ref T payloads are rejected with E2801.
iso T payloads are accepted and the binding is consumed.
Reading a consumed iso binding emits E2802.
val T and tag T payloads are sendable.

This is a type check, not a serialization trait check. Janus does not require a Serialize trait for actor messages. Wire-ready message payloads must use SBI-compatible layout when the distributed transport path lands.

Explicit child stop

Use local_stop_child when a caller wants to stop a live child without applying its restart policy:

let stopped = cluster.local_stop_child(
    system,
    0 as u64,
    cluster.STOP_REASON_SHUTDOWN,
)

Shutdown and normal stop reasons do not create tombstones. Abnormal, killed, and pledge-violation stop reasons do create tombstones, but still do not restart the child. local_handle_crash and local_handle_exit remain the restart-policy paths for simulated or observed actor exits.

Mailbox backpressure

The local actor mailbox is bounded. Actors without @mailbox use the runtime channel default: one pending handoff slot. The public send surface is non-blocking:

let sent = cluster.local_try_send(system, 0 as u64, 42 as i64)
let sent_ref = cluster.local_ref_try_send(actor_ref, 42 as i64)

Return codes are stable for the current tracer bullet:

1: the message was accepted.
0: the child slot is empty or the mailbox is full.
-1: the mailbox channel is closed.

Use @mailbox(capacity: N) or @mailbox(capacity: N, overflow: .reject) on a compiler-generated actor to set the supervised actor mailbox capacity. The compiler also uses the same value for direct spawn Actor() mailboxes. In the v1 local runtime, omitted overflow means .reject: send returns 0 when the mailbox is full. overflow: .drop_oldest, .drop_newest, and .block_sender are rejected by janus build with E_CLUSTER_MAILBOX until those runtime policies are executable. Unknown @mailbox fields are also rejected: the canonical v1 shape is capacity plus optional overflow. Production callers should treat 0 as backpressure or missing-child rejection and retry, drop, or escalate according to their actor protocol.

@arena policy is also checked at build time. If present, it must describe an executable actor/grain allocator-domain contract:

@arena(scope: .actor, reset: .on_restart, max_bytes: 4096)
actor Worker(msg: WorkMsg) do
    receive do
        WorkMsg.Ping => do end,
    end
end

@arena(scope: .grain, reset: .on_deactivate)
grain User(id: u64, msg: UserMsg) do
    receive do
        UserMsg.Ping => do end,
    end
end

The scope must match the declaration kind. reset must be one of .on_stop, .on_restart, .on_deactivate, .generation, or .manual. reset: .manual requires explicit reason metadata. Optional max_bytes currently must be a positive compile-time integer literal. Invalid arena metadata fails with E_CLUSTER_ARENA.

For compiler-generated local actors and grains, max_bytes is forwarded into the generated start helper. The current runtime allocation for generated scalar state slots uses u64 slots; setup fails before activation when slot_count * 8 is greater than the configured byte ceiling. The configured limit is visible by ref or by raw local slot:

let actor_limit = cluster.local_ref_arena_max_bytes(actor_ref)
let slot_limit = cluster.local_arena_max_bytes(system, 0 as u64)

Capability-gated callers use local_ref_arena_max_bytes_cap and local_arena_max_bytes_cap. Full allocator-domain accounting for arbitrary actor-local allocations remains future runtime work.

@replicate validates the source shape for replicated or consistent session state:

@replicate(scope: .wing)
var threat_map = alloc[Session.Replicated](0 as u64)

@replicate(scope: .swarm, protocol: .pbft)
var engagement_rules = alloc[Session.Consistent](0 as u64)

Allowed scopes are .wing, .cluster, and .swarm. The only v1 protocol metadata accepted today is .pbft, and it may be omitted. Invalid replication metadata fails with E_CLUSTER_REPLICATE; this is compile-time contract validation, not runtime replication execution.

@reductions metadata uses one canonical shape:

@reductions(limit: 128, yield: .loop_backedge)
actor Worker(msg: WorkMsg) do
    receive do
        WorkMsg.Ping => do end,
    end
end

limit is required and must be a positive compile-time integer literal. yield may be omitted; if present in the current v1 surface, it must be .loop_backedge. The old budget spelling is not a synonym and fails with E_CLUSTER_REDUCTIONS.

For compiler-generated local actors and grains, the accepted limit is now forwarded into the local runtime. The current executable surface counts handler-boundary reductions: each delivered message or timeout consumes one local reduction unit, and the runtime exposes the configured limit, remaining budget, and yield-marker count.

let limit = cluster.local_ref_reduction_limit(actor_ref)
let remaining = cluster.local_ref_reduction_remaining(actor_ref)
let yields = cluster.local_ref_reduction_yields(actor_ref)

The same counters are available by raw system slot:

let limit = cluster.local_reduction_limit(system, 0 as u64)
let yields = cluster.local_reduction_yields(system, 0 as u64)

This is deliberately narrower than the final scheduler contract. Function-entry checks, loop-backedge checks, selective-receive scan costs, send/reply costs, and blocking-call reduction costs remain future compiler/runtime injection work under the same source annotation.

@reload metadata is checked as dispatch-table source contract:

@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)
grain User(id: u64, msg: UserMsg) do
    receive do
        UserMsg.Ping => do end,
    end
end

boundary is required and must be .message, .idle, .supervised_restart, or .forbidden. state and migrate must be declared together. Unknown fields and non-executable boundaries fail with E_CLUSTER_RELOAD. This is metadata validation only; signed module loading, ABI/state hash comparison, dispatch-entry swap, and hot-reload authorization remain runtime work.

@observe metadata also has one source shape:

@observe(mailbox: .summary, state: .none, current_message: .type_only)
actor Worker(msg: WorkMsg) do
    receive do
        WorkMsg.Ping => do end,
    end
end

mailbox may be .summary or .none. state may be .none, .redacted, or .full. current_message may be .none, .type_only, .redacted, or .full. The old events field is not canonical and fails with E_CLUSTER_OBSERVE; activation/deactivation events belong to lifecycle or tombstone streams, not observation-level metadata.

The local v1 runtime exposes the .summary registry through capability-gated packed snapshots:

let summary = cluster.local_observe_ref_summary_cap(cap, actor_ref)
if cluster.local_observe_is_present(summary) do
    let lifecycle = cluster.local_observe_lifecycle(summary)
    let pending = cluster.local_observe_mailbox_len(summary)
    let restarts = cluster.local_observe_restart_count(summary)
end

let reductions = cluster.local_observe_ref_reductions_cap(cap, actor_ref)
if cluster.local_observe_is_present(reductions) do
    let limit = cluster.local_observe_reduction_limit(reductions)
    let remaining = cluster.local_observe_reduction_remaining(reductions)
    let yields = cluster.local_observe_reduction_yields(reductions)
end

let reason = cluster.local_ref_schedule_reason_cap(cap, actor_ref)

Use local_observe_child_summary_cap(cap, system, slot) when the caller has a system handle and slot rather than a stable ref. Use local_observe_child_reductions_cap(cap, system, slot) for the equivalent packed reduction counters. The status summary exposes only status metadata: lifecycle, task state, last exit reason, mailbox length, mailbox capacity, and restart count. The reduction summary exposes configured limit, remaining budget, and yield markers. local_ref_schedule_reason_cap and local_schedule_reason_cap expose the last local scheduling reason as one of SCHEDULE_REASON_NONE, SCHEDULE_REASON_MESSAGE, or SCHEDULE_REASON_REDUCTION_YIELD. Observation summaries return 0 for absent or stale refs. State snapshots, payload snapshots, and cross-node aggregation remain future observation levels.

@tombstone metadata uses explicit hot-index policy fields:

@tombstone(enabled: true, digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)
actor Worker(msg: WorkMsg) do
    receive do
        WorkMsg.Ping => do end,
    end
end

enabled must be true or false. digest_includes may list .payload and .state; state digests still require a redacted observation or serialization contract. retention_window and deadly_threshold must be positive compile-time integer literals. The old classifier field is not canonical and fails with E_CLUSTER_TOMBSTONE.

@behaviour metadata validates common actor/grain shapes:

@behaviour(.server)
actor Worker(msg: WorkMsg) do
    init(start: i64) -> i64 do
        return start
    end

    receive do
        WorkMsg.Ping => do end,
    end
end

The v1 compiler accepts exactly one positional behaviour symbol. Known symbols are .server, .worker, .event_handler, .state_machine, and .supervisor. .server currently requires an init hook so the state shape is visible. .supervisor belongs to supervisor ... end syntax, not an actor/grain annotation. Shape mismatches fail with CL-E1413.

Mailbox pressure is observable through scalar status accessors:

let pending = cluster.local_child_mailbox_len(system, 0 as u64)
let slots = cluster.local_child_mailbox_capacity(system, 0 as u64)

The default reports slots == 1. An actor declared with @mailbox(capacity: 4) reports slots == 4. Both functions return -1 when the slot has no live child.

Janus Status Accessors

The Janus facade exposes local supervisor and child status without exposing actor state:

let supervisor_state = cluster.local_supervisor_state(system)
let lifecycle = cluster.local_child_lifecycle(system, 0 as u64)
let task_state = cluster.local_child_task_state(system, 0 as u64)
let last_exit = cluster.local_child_last_exit(system, 0 as u64)

local_supervisor_state returns:

SUPERVISOR_STATE_RUNNING
SUPERVISOR_STATE_STOPPED
SUPERVISOR_STATE_FAILED
-1 for an invalid handle

local_child_lifecycle returns:

CHILD_LIFECYCLE_UNCONFIGURED
CHILD_LIFECYCLE_CONFIGURED
CHILD_LIFECYCLE_RUNNING
CHILD_LIFECYCLE_STOPPED
CHILD_LIFECYCLE_FAILED
-1 for an invalid handle or slot

local_child_task_state returns TASK_STATE_READY, TASK_STATE_RUNNING, TASK_STATE_BLOCKED, TASK_STATE_BUDGET_EXHAUSTED, TASK_STATE_COMPLETED, TASK_STATE_CANCELLED, or -1 when no live task is present.

local_child_last_exit returns the same STOP_REASON_* codes used by local_handle_exit, or -1 when no exit is recorded.

Prefer local_observe_ref_summary_cap or local_observe_child_summary_cap for the canonical capability-gated status snapshot. The individual accessors remain low-level local bridge tools and compatibility probes.

Every status accessor has a _cap form that consumes ClusterLocalCap. These accessors report lifecycle and pressure only; they do not expose actor-local variables or grain-owned state.

Reduction accessors follow the same local-only rule. Prefer local_observe_ref_reductions_cap or local_observe_child_reductions_cap when the caller is already using the observation registry. The lower-level local_ref_reduction_* helpers accept a stable actor or grain ref, and local_reduction_* accepts a system handle plus child slot. The values are counters, not scheduler authority; code that changes reduction policy or forces preemption still belongs behind Cap.cluster.preempt.

Scheduling reason accessors are also local-only. local_ref_schedule_reason_* accepts a stable actor or grain ref, and local_schedule_reason_* accepts a system handle plus child slot. The current reason codes identify no observed dispatch, ordinary message dispatch, or reduction-budget yield marker.

LocalActorSystem

LocalActorSystem is the ergonomic root for the local tracer bullet. It keeps callers on the public std.cluster path instead of reaching into runtime internals.

const cluster = @import("std_cluster");

var system = try cluster.LocalActorSystem.init(
    allocator,
    1,              // nursery id
    "root",         // supervisor id
    .one_for_one,
    2,              // child slots
);
defer system.deinit();

Starting Children

Children are started from ChildSpec values. A child start function receives the actor-system nursery and the allocator owned by the supervisor.

fn startWorker(nursery: *cluster.Nursery, allocator: std.mem.Allocator) !cluster.SupervisedChild {
    const actor = try allocator.create(cluster.Actor);
    errdefer allocator.destroy(actor);

    actor.* = try cluster.Actor.init(allocator, 1, 1);
    errdefer actor.deinit();

    const task = cluster.spawn(nursery, actor, workerHandler) orelse return error.ActorSpawnRejected;
    return .{ .actor = actor, .task = task };
}

_ = try system.startChild(0, .{
    .id = "worker",
    .start_fn = startWorker,
    .restart = .permanent,
});

You can also configure children first and start them later:

try system.configureChild(0, .{
    .id = "worker",
    .start_fn = startWorker,
    .restart = .permanent,
});

const started = try system.startConfiguredChildren();

Handling Exits

Use handleCrash for ordinary abnormal actor failure:

try system.handleCrash(0);

Use handleExit when the caller knows the exact stop reason:

try system.handleExit(0, .pledge_violated);

The Janus facade exposes the same path with stable STOP_REASON_* codes:

if cluster.local_handle_exit(
    system,
    0 as u64,
    cluster.STOP_REASON_PLEDGE_VIOLATED,
) != 1 as i32 do return 1 end

Use handleExitAt for deterministic restart-window tests or runtime loops that already have a timestamp:

try system.handleExitAt(0, .abnormal, 100);
const status = system.statusAt(100);

Actor Tombstones

Abnormal terminal exits now produce actor tombstones. Normal exits and shutdown exits are intentionally skipped; tombstones are for failure classes that may need replay, audit, or repair.

The local runtime keeps the existing bounded in-memory tombstone index and can also mirror each tombstone to a caller-provided sink:

Use the typed Janus sink hook: local_set_tombstone_sink(system, ctx_addr, append_callback). The callback is a top-level func(u64, u64) -> i32; the compiler lowers it to internal bridge plumbing. The legacy _addr hook remains bridge-only compatibility surface and must not be taught as the public callback API.

The callback receives an opaque context pointer and a callback-scoped record pointer. Copy or persist the record during the callback; do not retain record_raw.

Sink counters are exposed for monitoring:

let stored = cluster.local_tombstone_sink_appends(system)
let failed = cluster.local_tombstone_sink_failures(system)

Stable stop-reason codes are available as STOP_REASON_NORMAL, STOP_REASON_SHUTDOWN, STOP_REASON_ABNORMAL, STOP_REASON_KILLED, STOP_REASON_PLEDGE_VIOLATED, and STOP_REASON_MIGRATION_ABORTED.

Tombstone Classification

The supervisor hot index can classify the latest tombstone against prior tombstones with the same deterministic pattern: child slot, spec id, stop reason, code version, and input digest. Janus exposes scalar accessors for the current local runtime:

let matches = cluster.local_tombstone_classify_match_count(
    system,
    now_seconds,
    3 as u32,
    60 as i64,
)

let deadly = cluster.local_tombstone_classify_deadly(
    system,
    now_seconds,
    3 as u32,
    60 as i64,
)

let oldest = cluster.local_tombstone_classify_oldest_sequence(
    system,
    now_seconds,
    3 as u32,
    60 as i64,
)

matches is the number of hot-index tombstones matching the latest pattern inside the window. deadly returns 1 when matches reaches the threshold. oldest returns the oldest matching tombstone sequence, or 0 when no latest tombstone exists. Each function also has a _cap form that consumes ClusterLocalCap.

The latest hot-index tombstone can also be observed as bounded scalar metadata:

let seq = cluster.local_latest_tombstone_sequence_cap(cap, system)
if seq != 0 as u64 do
    let child = cluster.local_latest_tombstone_child_cap(cap, system)
    let reason = cluster.local_latest_tombstone_reason_cap(cap, system)
    let code = cluster.local_latest_tombstone_code_version_cap(cap, system)
    let digest = cluster.local_latest_tombstone_input_digest_cap(cap, system)
    let has_replay = cluster.local_latest_tombstone_replay_token_present_cap(cap, system)
    let attempt = cluster.local_latest_tombstone_attempt_count_cap(cap, system)
    _ = child
    _ = reason
    _ = code
    _ = digest
    _ = has_replay
    _ = attempt
end

local_latest_tombstone_sequence_cap is the presence check. When it returns 0, the child accessor also returns 0; callers should not treat that as a real child slot without a nonzero sequence. Replay-token observation is a presence flag only. The token value is not exposed by this surface because replay is a separate diagnostic authority.

Tombstone Quarantine

The local supervisor can suppress deterministic-deadly restart loops before the restart budget is exhausted. Quarantine is explicit local runtime policy:

let cap = caps.unsafe_forge_cluster_local_cap()

_ = cluster.local_set_tombstone_quarantine_config_cap(
    cap,
    system,
    3 as u32,
    60 as i64,
)
_ = cluster.local_set_tombstone_quarantine_cap(cap, system, 1 as u32)

let quarantined = cluster.local_child_quarantined_cap(cap, system, 0 as u64)
let total = cluster.local_quarantined_children_cap(cap, system)
let first = cluster.local_first_quarantined_child_cap(cap, system)

local_child_lifecycle_cap returns CHILD_LIFECYCLE_QUARANTINED for a configured child that the local tombstone classifier has suppressed. local_clear_tombstone_quarantine_cap clears the local mark for a slot; it does not restart the child, delete tombstones, replay payloads, or affect distributed placement policy. Cross-node quarantine gossip and placement aggregation remain runtime/operator work.

Tombstones To STL

std.cluster.tombstones converts callback records into canonical STL events. The adapter keeps cluster supervision and STL storage separate: the sink copies scalar tombstone fields, builds an ActorTombstone, and appends through an std.stl.lsm_store.LSMStore.

use std.cluster.local as cluster
use std.cluster.tombstones as tombstones
use std.db.lsm as lsm
use std.stl.lsm_store as lsm_store
use std.stl.store as store

pub func tombstone_sink(ctx: u64, record_raw: u64) -> i32 do
    let gs = as[*lsm.GrainStoreBytes](ctx)
    var stl = lsm_store.make_store(gs)

    var t = tombstones.zero()
    t.sequence = cluster.tombstone_sequence(record_raw)
    t.child = cluster.tombstone_child(record_raw)
    t.reason = cluster.tombstone_reason(record_raw)
    t.attempt_count = cluster.tombstone_attempt_count(record_raw)
    t.timestamp_seconds = cluster.tombstone_timestamp_seconds(record_raw)

    if tombstones.append_lsm(&stl, &t) != store.STORE_OK do
        return 0
    end
    return 1
end

The sink context should point at the borrowed GrainStoreBytes. The callback creates a short-lived LSMStore wrapper over that same store; fresh wrappers can rescan LSM truth later for count, rank lookup, and flush.

Task Completion Routing

The local actor system can route a completed nursery task back to the supervised child slot:

const task = system.childTaskAt(1) orelse return error.MissingTask;
task.markCompleted(5);

const restarted_idx = try system.handleTaskCompleteByTask(task);

Stale task handles are rejected. This matters after a restart, because the old task pointer must not be allowed to affect the replacement child.

Restart Controls

Restart budgets are opt-in:

system.setRestartLimit(2, 60);

From Janus:

_ = cluster.local_set_restart_limit(system, 2 as u32, 60 as i64)

The limit is counted per restart window. When the budget is exhausted, the supervisor moves to failed, records the failed child and reason, and stops remaining active children according to the implemented supervisor failure cleanup.

Janus callers can test exhaustion through:

let exhausted = cluster.local_restart_limit_exhausted(system)

Pledge violations do not restart by default. This is intentional because pledge failure is a capability boundary event, not an ordinary crash. Local systems can explicitly opt in:

system.setRestartPledgeViolations(true);

From Janus:

_ = cluster.local_set_restart_pledge_violations(system, 1 as u32)

At the source supervisor surface, the same explicit choice is loud. A declaration with restart_pledge_violations: true builds, but emits PU-W007 so the opt-in is visible during review.

Lifecycle

Use stopChild, stopChildren, or shutdown for explicit lifecycle control:

try system.stopChild(0, .shutdown);
_ = try system.stopChildren(.killed);
system.shutdown();

shutdown stops active children and moves the supervisor to stopped.

Status Inspection

The facade exposes supervisor and child snapshots:

const supervisor_status = system.status();
const child_status = system.childStatus(0);
const failure = system.failure();

SupervisorStatus includes:

strategy and state
slot count
configured, active, stopped, and failed child counts
total restarts
restart exhaustion metadata
restart limit and remaining restarts

ChildStatus includes:

lifecycle
configured spec id and restart policy
actor id and task id when running
task state when available
last exit reason
restart count

Promise[T] and request/response (SPEC-236)

Local :cluster request/response uses std.cluster.promise, not oneshot channels. A Promise[T] is a typed handle to a single future result. Creating a promise also creates one affine PromiseResolver[T] right.

Manual path (Phase A)

use std.cluster.promise as promise

message QueryMsg {
    Get { reply: PromiseResolver[u64] },
    Stop,
}

// Caller
let p: Promise[u64] = promise.new[u64]()
let reply: PromiseResolver[u64] = promise.resolver[u64](p)
_ = typed_ref.send(QueryMsg.Get { reply: reply })
_ = cluster.local_ref_drive_once(actor_ref)
let answer = promise.await_or[u64](p, 0 as u64, 0 as u64)

Terminal states: Pending → Resolved | Failed | Cancelled (single transition). Inspect with promise.state, failure_kind, cancel_reason. Signal with signal_fail / signal_cancel.

Compile-time gates:

Code	Rule
`P236-E003`	`Promise[ref T]` / `PromiseResolver[ref T]` cannot cross actor/grain boundaries
`P236-E002`	`PromiseResolver[T]` is affine — no copy

Await yields via nanosleep backoff ([PROM:7.3]); true condvar park is v0.2.

`.call()` sugar (Phase B)

When a message variant declares a PromiseResolver[T] field (conventionally reply), typed refs support:

let p: Promise[u64] = typed_ref.call(QueryMsg.Get {})
// desugars to promise.new + inject reply + send; value is the Promise handle

Works on ActorRef[Msg] and GrainRef[Msg].
Do not pass reply: yourself — the compiler injects it.
Fire-and-forget unit variants (Ping, Stop) still use .send().
Variants without a result type reject with P236-E007.

Code	Meaning
`P236-E007`	`.call()` needs a `PromiseResolver[T]` reply field, or typed ref receiver

Verification smokes:

./scripts/zb test-cluster-promise-call-sugar
./scripts/zb test-cluster-promise-call-no-result-reject
./scripts/zb test-cluster-promise-request-response

PromiseId + schema fingerprints (Phase C)

Local Promise[T] values are packed PromiseIds:

[63:48] authority_scope | [47:32] epoch | [31:0] slot

Decode with id_slot / id_epoch / id_scope. Stale epoch or scope fails closed ([PROM:6.2]). runtime_reset clears the table and advances the process epoch (tests / restart simulation).

Optional result schema gate ([PROM:6.3]):

let p = promise.new_with_schema[u64](0xA11CE001 as u64)
let r = promise.resolver[u64](p)
// mismatch → Failed(SchemaMismatch), returns 0
_ = promise.resolve_with_schema[u64](r, 1 as u64, 0xB0B00002 as u64)

new / resolve remain the unchecked path. Fingerprints are caller-supplied u64 until the compiler emits full SBI type CIDs (SPEC-039).

./scripts/zb test-cluster-promise-sbi-id
./scripts/zb test-cluster-promise-sbi-fingerprint

Local pipeline graph (Phase D v0.1)

Dependent promises form a node-local segment graph:

_ = promise.pipeline_link[u64, u64](root, segment)
// root fail/cancel/pending-destroy → segment Failed(ERR_UPSTREAM_FAILED)

Success on the root does not auto-resolve children.
Optional set_tombstone_token / tombstone_token for producer replay ids.
Modes: PIPELINE_MODE_SEQUENTIAL (default) and reserved PIPELINE_MODE_ENVELOPE. Requesting envelope before wire support exists still records the segment and increments pipeline_downgrade_count().

./scripts/zb test-cluster-promise-pipeline-cascade
./scripts/zb test-cluster-promise-pipeline-downgrade

Not yet: multi-hop SBI pipeline envelopes, Promise[ActorRef[T]].call compiler lowering ([PROM:8.2]), peer handshake bits.

Release notes: Phase A, Phase B, Phase C, Phase D v0.1.

Current Limits

Local runtime only for Promise and .call().
No placement, membership, gossip, or remote send.
No automatic actor registry integration.
No hot reload.
No persistence for actor state. Actor tombstones can be persisted to STL; live actor state replay remains future work.
Slot type is u64. Heterogeneous typed state and non-u64 payload fields remain future work.
Promise payloads in v0.1 are SBI scalars (typically u64 handles).
Promise schema fingerprints in v0.1 are explicit u64 values, not yet automatic BLAKE3 type CIDs.

The current goal is a correct local supervised-actor tracer bullet. Distributed :cluster features build on this surface later.

std.cluster

std.cluster

Scope

Janus actor start

Source supervisor start

Local grain activation registry

Sendability

Explicit child stop

Mailbox backpressure

Janus Status Accessors

LocalActorSystem

Starting Children

Handling Exits

Actor Tombstones

Tombstone Classification

Tombstone Quarantine

Tombstones To STL

Task Completion Routing

Restart Controls

Lifecycle

Status Inspection

Promise[T] and request/response (SPEC-236)

Manual path (Phase A)

.call() sugar (Phase B)

PromiseId + schema fingerprints (Phase C)

Local pipeline graph (Phase D v0.1)

Current Limits

`.call()` sugar (Phase B)