Skip to content

:cluster — The Sanctum

“Fault tolerance by design.”

:cluster is where Janus becomes a system for building software that endures. Everything from :service plus the actor model, supervised lifecycle, local grain activation shells, the local grain activation registry, local grain namespace lookup, and future distribution layers belongs here.

The current compiler/runtime slice is local supervised actors plus the first grain source-contract shell, local single-writer activation registry, local namespace lookup layer, and explicit GrainStore-backed lifecycle callbacks. Compiler-generated state serializers, remote placement, first-class supervisor declarations, migration, durable namespace persistence, and distributed registries are roadmap work, shown below as future sketches where noted.


Actors — Concurrent Entities with Mailboxes

Section titled “Actors — Concurrent Entities with Mailboxes”

The canonical shape today is actor X do var ... receive do match __msg { ... } end end. Each actor compiles to a setup/handler/destroy triple that the generated X_start_supervised(system, slot, policy) wrapper threads into the local supervisor. Messages are i64 and dispatch is over their value via an explicit match __msg.

actor Counter do
var count: i64 = 0
receive do
match __msg {
0 => do
count = count + 1
end,
1 => do
return 0
end,
_ => do
count = count
end,
}
end
end
  • Isolated state — No shared memory; each var is a private slot.
  • Auto-supervisedCounter_setup / Counter_handler / Counter_destroy and Counter_start_supervised are auto-emitted alongside the spawn-form __Counter_loop.
  • No locks — Message passing is the only concurrency.

Walk through it hands-on in the Stateful Actors tutorial.

Typed message protocols are now local actor syntax, not just a sketch: message declarations may include payload variants, ActorRef[Msg] checks the send protocol, and receive arms can destructure local boxed payload messages:

message Cmd {
Tick,
Set { value: u64 },
Stop,
}
actor Counter(msg: Cmd) do
var count: u64 = 0
receive do
Cmd.Tick => do
count += 1
end,
Cmd.Set { value } when value >= 0 as u64 => do
count += value
end,
Cmd.Stop => do
return 0
end,
after 30_000 => do
count = count
end,
end
end

This is still the node-local actor path. Guards and receive-loop timeouts are live; supervised actors register after arms as local mailbox timeouts. Distributed payload wire formats remain future :cluster work.

Local Grain Shell — Virtual Identity Shape

Section titled “Local Grain Shell — Virtual Identity Shape”
message UserMsg {
Ping,
Stop,
}
@persist(via: GrainStoreBytes)
@lifecycle(activation: .lazy)
grain User(id: u64, msg: UserMsg) do
var count: u64 = 0
receive do
UserMsg.Ping => do
count += 1
end,
UserMsg.Stop => do
return 0
end,
end
end
  • Live now — the parser accepts grain Name(id: Id, msg: Msg), @persist, @lifecycle, state slots, receive arms, and emits a local supervised start wrapper.
  • Live nowcluster.local_grain_lookup_or_start(...) maps a numeric (grain_type, grain_id) to one stable local activation ref while it is live.
  • Live nowcluster.local_grain_lookup_or_start_namespace(...) maps a local (grain_type, namespace) key to an internal durable id, then reuses the same single-writer activation registry.
  • Live nowcluster.local_grain_lookup_or_start_persistent(...) invokes explicit load/store callbacks that can restore and commit state through GrainStoreBytes.
  • Live now — local grain persistence exposes per-system load/store failure counters so operators can detect callback failures instead of inferring them from stopped activations.
  • Not live yet — compiler-generated GrainStore serializers, durable namespace persistence, passivation, migration, and remote routing.
  • Rule — a grain is virtual identity with owned state. The current shell proves the source shape; the local registry pins the single-writer identity invariant.
supervisor GameServerSupervisor do
strategy: one_for_one
child LobbyManager # Restart on crash
child MatchMaker # Restart on crash
child MetricsCollector # Restart on crash
end
  • one_for_one — Restart crashed child only
  • one_for_all — Restart all if any crashes
  • rest_for_one — Restart crashed + subsequent children
  • Exponential backoff — Prevent death spirals
  • Memory sovereignty tagsLocal.Exclusive, Session.Replicated, Volatile.Ephemeral
  • Typed message protocolsmessage declarations, ActorRef[Msg], local payload sends, guarded receive-arm payload destructuring, and direct receive-loop timeout arms are live for node-local actors
  • Location transparency — Same syntax for local and remote

ExcludedAvailable In
Tensors and GPU:compute
Raw pointers and unsafe:sovereign

Perfect for:

  • Game servers handling thousands of concurrent connections
  • Chat systems and real-time messaging
  • Distributed databases and key-value stores
  • Metaverse infrastructure and virtual world backends
  • Any system where a node crash should not take down the service
  • Stateful services that need to persist across restarts

The rule: If it needs to stay up when hardware fails, :cluster is your home.


The following examples show the intended destination for grains, remote message payloads, and first-class supervisor declarations. They are not the current local actor tracer bullet.

message ChatMsg {
Join { user_id: UserId, reply: Reply[void] },
Send { user_id: UserId, text: String },
Leave { user_id: UserId },
history { count: i32, reply: Reply[[Message]] },
}
actor ChatRoom(room_id: RoomId) implements ChatMsg do
var members: Set[UserId] := Set.new()
var messages: [Message] := []
receive do
| Join { user_id, reply } => do
members.insert(user_id)
reply.send(void.ok())
end
| Send { user_id, text } => do
if not members.contains(user_id) do
reply.send(Error.not_a_member())
return
end
messages.push(Message{user_id, text, now()})
end
| Leave { user_id } => do
members.remove(user_id)
end
end
supervisor DatabaseCluster do
strategy: one_for_all
child ConnectionPool(max: 10)
child QueryProcessor
child MetricsExporter
# If ConnectionPool crashes, ALL children restart
# This ensures consistent state across the cluster
end
message KVStoreMsg {
Get { key: String, reply: Reply[Option[Bytes]] },
Set { key: String, value: Bytes, reply: Reply[void] },
Delete { key: String, reply: Reply[void] },
Range { start: String, end: String, reply: Reply[[(String, Bytes)]] },
}
@requires(cap: [.storage_nvme, .network_infiniband])
grain KVNode(node_id: NodeId) implements KVStoreMsg do
var data: HashMap[String, Bytes]
receive do
| Get { key, reply } => do
reply.send(data.get(key))
end
| Set { key, value, reply } => do
data.set(key, value)
# Replicate to other nodes
replicate(key, value)
reply.send(void.ok())
end
end

vs. Erlang/OTP:

  • Types — Erlang’s dynamic types are a feature we left behind
  • Generics — No more boilerplate for different message types
  • Single language — Everything in Janus, not a separate DSL

vs. Akka (Scala/Java):

  • Lighter — No JVM overhead
  • Better interop — Native Zig bindings via graft
  • Simpler — No implicit state machines

vs. Go + etcd:

  • Supervision built-in — etcd is external, here it’s native
  • Location transparency — Go needs service discovery, Janus has it baked in
  • Grain migration — Go services can’t move between nodes automatically


Build systems that endure.