Skip to content

M:N Scheduler & Fibers

The Janus runtime provides a Capability-Budgeted Cooperative M:N Scheduler (CBC-MN) — lightweight fibers multiplexed onto OS threads with structured concurrency guarantees via nurseries.

PrimitivePurpose
SchedulerManages worker threads, distributes tasks via work-stealing
NurseryStructured concurrency scope — all children complete before exit
TaskLightweight fiber with dedicated stack and budget
SpawnOptsPer-spawn configuration (stack size, priority)
BudgetResource limits per nursery/task (ops, memory, spawns, syscalls)
CancelTokenCooperative cancellation propagated through nursery tree
const sched = @import("janus_sched");
// Initialize scheduler with 4 workers
var scheduler = try sched.Scheduler.init(allocator, 4);
defer scheduler.deinit();
try scheduler.start();
defer scheduler.stop();
// Create a nursery (structured concurrency scope)
var nursery = scheduler.createNursery(sched.Budget.serviceDefault());
defer nursery.deinit();
// Spawn tasks — all must complete before nursery exits
_ = nursery.spawn(&myTask, @ptrCast(&args));
_ = nursery.spawn(&myTask, @ptrCast(&args2));
// Await all children (yields in fiber context, polls on main thread)
const result = nursery.awaitAll();

All task entry points use C calling convention with opaque argument pointer:

fn myTask(arg: ?*anyopaque) callconv(.c) i64 {
const ctx: *MyContext = @ptrCast(@alignCast(arg.?));
// ... do work ...
return 0; // >= 0 success, < 0 error code
}

Return convention:

  • >= 0 — success (value stored in TaskResult.success)
  • < 0 — error (value stored in TaskResult.error_code, triggers nursery cancellation)

Fiber stack sizes are configurable per spawn with profile-gated defaults. This follows the Janus principle of mechanism over policy: the scheduler provides the mechanism (configurable stacks), profiles provide the policy (default sizes), developers override when they know better.

Each Janus profile has a stack size calibrated to its workload:

ProfileDefault StackRationale
:core64 KBCompute-focused, minimal I/O
:service256 KBReal systems work with Zig stdlib interop
:cluster256 KBActors + supervisors
:sovereign512 KBCrypto operations, proof chains, DID resolution

These sizes were measured empirically from Graf — the first production consumer of Janus fibers:

  • Dir.iterate() allocates a 2 KB internal reader buffer on stack
  • dirOpenDirPosix() allocates a PATH_MAX (4 KB) stack buffer
  • std.sort.block() allocates [512]T cache — for large structs (~96 bytes), that’s ~48 KB

The :service default of 256 KB handles all known Zig stdlib stack usage while remaining 1/32 of the default thread stack (8 MB).

When the profile default isn’t right for a specific task, override it:

const sched = @import("janus_sched");
// Default: uses nursery's profile default (e.g., 256KB for :service)
_ = nursery.spawn(&normalTask, arg);
// Override: this task needs extra stack for crypto operations
_ = nursery.spawnWithOpts(&heavyTask, arg, .{
.stack_size = 512 * 1024, // 512KB
});
pub const SpawnOpts = struct {
/// Stack size for the spawned fiber (null = use nursery default)
stack_size: ?usize = null,
/// Priority hint (null = Normal)
priority: ?Priority = null,
};

Available via Task.StackDefaults:

pub const StackDefaults = struct {
pub const CORE: usize = 64 * 1024; // 64 KB
pub const SERVICE: usize = 256 * 1024; // 256 KB
pub const CLUSTER: usize = 256 * 1024; // 256 KB
pub const SOVEREIGN: usize = 512 * 1024; // 512 KB
};

The scheduler provides two nursery creation methods:

// Uses SERVICE profile default (256KB stacks)
var nursery = scheduler.createNursery(budget);
// Explicit stack size for all children in this nursery
var nursery = scheduler.createNurseryWithStackSize(budget, 64 * 1024);
// Profile-aware helper
var nursery = scheduler.createNurseryWithStackSize(
budget,
sched.profileStackSize(.sovereign), // 512KB
);

Nurseries enforce no orphan tasks. Every spawned fiber belongs to a nursery, and the nursery does not exit until all children complete, error, or are cancelled.

Tasks can create sub-nurseries for hierarchical concurrency:

supervisor nursery
+-- agent fiber 1
| +-- agent nursery
| +-- scanner fiber (subdir A)
| +-- scanner fiber (subdir B)
+-- agent fiber 2
+-- agent nursery
+-- scanner fiber (subdir C)

Cancellation propagates transitively: cancelling the supervisor cancels all agents, which cancels all scanners.

// Cancel nursery — all children receive cancellation
nursery.cancel();
// Check cancellation in task code
if (nursery.isCancelled()) return -1;
// Token-based cancellation for fine-grained control
const token = nursery.getToken();
if (token.is_cancelled()) return -1;

Failure semantics: When any child fails (returns negative), the nursery’s cancel token is triggered, signaling siblings to check for cancellation at their next yield point.

Every nursery and task has a budget that limits resource consumption:

const Budget = struct {
ops: u64, // Operation count
memory: u64, // Memory allocation limit
spawn_count: u64, // Maximum child spawns
syscalls: u64, // System call limit
};
// Profile defaults
const b = Budget.serviceDefault(); // Generous limits for services
const b = Budget.childDefault(); // Per-task budget slice
const b = Budget.zero(); // No budget (for :core profile)

When a task exhausts its budget, it transitions to BudgetExhausted state and can be recharged by a supervisor.

Instead of channels, use pre-allocated result slots — each fiber writes to its own exclusive slot, awaitAll() provides the memory barrier:

const results = try allocator.alloc(CID, count);
const successes = try allocator.alloc(bool, count);
const args = try allocator.alloc(TaskArgs, count);
for (0..count) |i| {
args[i] = .{
.result_cid = &results[i],
.success = &successes[i],
// ... other fields ...
};
}
var nursery = runtime.createNursery(Budget.serviceDefault());
defer nursery.deinit();
for (0..count) |i| {
_ = nursery.spawn(&taskFn, @ptrCast(&args[i]));
}
_ = nursery.awaitAll();
// Results are now safe to read — awaitAll is the barrier
scheduler.zig # Sovereign Index — re-exports all types
+-- scheduler/
+-- budget.zig # Budget types and costs
+-- task.zig # Task struct, StackDefaults, state machine
+-- nursery.zig # Nursery, SpawnOpts, structured concurrency
+-- worker.zig # Worker thread loop, yield, work-stealing
+-- deque.zig # Chase-Lev work-stealing deque
+-- continuation.zig # Fiber context setup
+-- context_switch.s # x86_64 assembly context switch
+-- context_switch_aarch64.s # aarch64 assembly context switch
+-- cancel_token.zig # Cooperative cancellation
  • SPEC-021: Capability-Budgeted Cooperative M:N Scheduler
  • SPEC-019: Cancellation Tokens and Structured Failure Propagation
  • SPEC-022: Scheduling Capabilities