M:N Scheduler & Fibers

The Janus runtime provides a Capability-Budgeted Cooperative M:N Scheduler (CBC-MN) — lightweight fibers multiplexed onto OS threads with structured concurrency guarantees via nurseries.

Core Primitives

Primitive	Purpose
Scheduler	Manages worker threads, distributes tasks via work-stealing
Nursery	Structured concurrency scope — all children complete before exit
Task	Lightweight fiber with dedicated stack and budget
SpawnOpts	Per-spawn configuration (stack size, priority)
Budget	Resource limits per nursery/task (ops, memory, spawns, syscalls)
CancelToken	Cooperative cancellation propagated through nursery tree

Quick Start

const sched = @import("janus_sched");

// Initialize scheduler with 4 workers
var scheduler = try sched.Scheduler.init(allocator, 4);
defer scheduler.deinit();
try scheduler.start();
defer scheduler.stop();

// Create a nursery (structured concurrency scope)
var nursery = scheduler.createNursery(sched.Budget.serviceDefault());
defer nursery.deinit();

// Spawn tasks — all must complete before nursery exits
_ = nursery.spawn(&myTask, @ptrCast(&args));
_ = nursery.spawn(&myTask, @ptrCast(&args2));

// Await all children (yields in fiber context, polls on main thread)
const result = nursery.awaitAll();

Task Functions

All task entry points use C calling convention with opaque argument pointer:

fn myTask(arg: ?*anyopaque) callconv(.c) i64 {
    const ctx: *MyContext = @ptrCast(@alignCast(arg.?));
    // ... do work ...
    return 0;  // >= 0 success, < 0 error code
}

Return convention:

>= 0 — success (value stored in TaskResult.success)
< 0 — error (value stored in TaskResult.error_code, triggers nursery cancellation)

Configurable Fiber Stacks

Fiber stack sizes are configurable per spawn with profile-gated defaults. This follows the Janus principle of mechanism over policy: the scheduler provides the mechanism (configurable stacks), profiles provide the policy (default sizes), developers override when they know better.

Profile Defaults

Each Janus profile has a stack size calibrated to its workload:

Profile	Default Stack	Rationale
`:core`	64 KB	Compute-focused, minimal I/O
`:service`	256 KB	Real systems work with Zig stdlib interop
`:cluster`	256 KB	Actors + supervisors
`:sovereign`	512 KB	Crypto operations, proof chains, DID resolution

These sizes were measured empirically from Graf — the first production consumer of Janus fibers:

Dir.iterate() allocates a 2 KB internal reader buffer on stack
dirOpenDirPosix() allocates a PATH_MAX (4 KB) stack buffer
std.sort.block() allocates [512]T cache — for large structs (~96 bytes), that’s ~48 KB

The :service default of 256 KB handles all known Zig stdlib stack usage while remaining 1/32 of the default thread stack (8 MB).

SpawnOpts — Per-Spawn Override

When the profile default isn’t right for a specific task, override it:

const sched = @import("janus_sched");

// Default: uses nursery's profile default (e.g., 256KB for :service)
_ = nursery.spawn(&normalTask, arg);

// Override: this task needs extra stack for crypto operations
_ = nursery.spawnWithOpts(&heavyTask, arg, .{
    .stack_size = 512 * 1024,  // 512KB
});

SpawnOpts Struct

pub const SpawnOpts = struct {
    /// Stack size for the spawned fiber (null = use nursery default)
    stack_size: ?usize = null,
    /// Priority hint (null = Normal)
    priority: ?Priority = null,
};

StackDefaults Constants

Available via Task.StackDefaults:

pub const StackDefaults = struct {
    pub const CORE: usize = 64 * 1024;      // 64 KB
    pub const SERVICE: usize = 256 * 1024;   // 256 KB
    pub const CLUSTER: usize = 256 * 1024;   // 256 KB
    pub const SOVEREIGN: usize = 512 * 1024; // 512 KB
};

Setting Nursery Defaults

The scheduler provides two nursery creation methods:

// Uses SERVICE profile default (256KB stacks)
var nursery = scheduler.createNursery(budget);

// Explicit stack size for all children in this nursery
var nursery = scheduler.createNurseryWithStackSize(budget, 64 * 1024);

// Profile-aware helper
var nursery = scheduler.createNurseryWithStackSize(
    budget,
    sched.profileStackSize(.sovereign),  // 512KB
);

Structured Concurrency

Nurseries enforce no orphan tasks. Every spawned fiber belongs to a nursery, and the nursery does not exit until all children complete, error, or are cancelled.

Nested Nurseries

Tasks can create sub-nurseries for hierarchical concurrency:

supervisor nursery
  +-- agent fiber 1
  |     +-- agent nursery
  |           +-- scanner fiber (subdir A)
  |           +-- scanner fiber (subdir B)
  +-- agent fiber 2
        +-- agent nursery
              +-- scanner fiber (subdir C)

Cancellation propagates transitively: cancelling the supervisor cancels all agents, which cancels all scanners.

Cancellation

// Cancel nursery — all children receive cancellation
nursery.cancel();

// Check cancellation in task code
if (nursery.isCancelled()) return -1;

// Token-based cancellation for fine-grained control
const token = nursery.getToken();
if (token.is_cancelled()) return -1;

Failure semantics: When any child fails (returns negative), the nursery’s cancel token is triggered, signaling siblings to check for cancellation at their next yield point.

Budgets

Every nursery and task has a budget that limits resource consumption:

const Budget = struct {
    ops: u64,          // Operation count
    memory: u64,       // Memory allocation limit
    spawn_count: u64,  // Maximum child spawns
    syscalls: u64,     // System call limit
};

// Profile defaults
const b = Budget.serviceDefault();   // Generous limits for services
const b = Budget.childDefault();     // Per-task budget slice
const b = Budget.zero();             // No budget (for :core profile)

When a task exhausts its budget, it transitions to BudgetExhausted state and can be recharged by a supervisor.

Result Collection Pattern

Instead of channels, use pre-allocated result slots — each fiber writes to its own exclusive slot, awaitAll() provides the memory barrier:

const results = try allocator.alloc(CID, count);
const successes = try allocator.alloc(bool, count);
const args = try allocator.alloc(TaskArgs, count);

for (0..count) |i| {
    args[i] = .{
        .result_cid = &results[i],
        .success = &successes[i],
        // ... other fields ...
    };
}

var nursery = runtime.createNursery(Budget.serviceDefault());
defer nursery.deinit();

for (0..count) |i| {
    _ = nursery.spawn(&taskFn, @ptrCast(&args[i]));
}
_ = nursery.awaitAll();
// Results are now safe to read — awaitAll is the barrier

Architecture

scheduler.zig              # Sovereign Index — re-exports all types
+-- scheduler/
    +-- budget.zig         # Budget types and costs
    +-- task.zig           # Task struct, StackDefaults, state machine
    +-- nursery.zig        # Nursery, SpawnOpts, structured concurrency
    +-- worker.zig         # Worker thread loop, yield, work-stealing
    +-- deque.zig          # Chase-Lev work-stealing deque
    +-- continuation.zig   # Fiber context setup
    +-- context_switch.s   # x86_64 assembly context switch
    +-- context_switch_aarch64.s  # aarch64 assembly context switch
    +-- cancel_token.zig   # Cooperative cancellation

Specifications

SPEC-021: Capability-Budgeted Cooperative M:N Scheduler
SPEC-019: Cancellation Tokens and Structured Failure Propagation
SPEC-022: Scheduling Capabilities