Testing With Katana

Time: 55 minutes
Level: Beginner to intermediate
Prerequisites: Error handling, modules, and the janus CLI
What you’ll learn: Write tests that fail usefully, prove errors as values, avoid hidden authority, and use the Katana runner from the command line.

Katana is the Janus test harness. It is small on purpose:

test syntax is test "..." do ... end;
assertions return errors;
try is how tests fail;
expected values come before actual values;
resource authority is explicit outside script ergonomics.

Step 1: Write A First Test

Create tests/counter_test.jan:

use std.testing

func increment(value: usize) -> usize do
    return value + 1
end

test "counter increments once" do
    const actual = increment(0)
    try testing.expect_equal[usize](1, actual)
end

Run it:

janus test tests/counter_test.jan

Expected output:

PASS T0001 counter increments once

Test Summary:
  Passed: 1
  Failed: 0

The important part is not the green line. The important part is the call shape:

try testing.expect_equal[usize](expected, actual)

The assertion returns TestError!void. try propagates that error to the runner. No panic is required.

Step 2: Read A Failure

Break the expectation:

test "counter increments once" do
    const actual = increment(0)
    try testing.expect_equal[usize](2, actual)
end

The runner reports source location and the mismatch:

FAIL T0001 counter increments once

Failures:
  "counter increments once": at tests/counter_test.jan:8
value mismatch
  expected: 2
  actual:   1

That is the Katana standard: the failure should be small enough to act on.

Step 3: Test Errors As Values

Janus does not use exception trapping as the normal failure path. Test the error union:

use std.testing

error ParseError {
    Empty,
}

func parse_count(input: []const u8) -> ParseError!usize do
    if input.len == 0 do
        fail ParseError.Empty
    end
    return input.len
end

test "empty input is rejected" do
    const result = parse_count("")
    try testing.expect_error[ParseError, usize](ParseError.Empty, result)
end

test "non-empty input returns length" do
    const length = try testing.expect_no_error[ParseError, usize](parse_count("janus"))
    try testing.expect_equal[usize](5, length)
end

Use expect_panic only for panic boundaries: FFI panic quarantine, compiler traps, and invariant checks. Invalid user input should normally be an error value.

Step 4: Compare Slices

Use expect_equal_slices for byte strings and arrays:

test "formatter preserves text" do
    const actual = "janus"
    try testing.expect_equal_slices[u8]("janus", actual)
end

When a slice differs, the runner reports length and the first differing index:

slice mismatch
  length: expected 5, actual 5
  first differing index: 1
  expected[1]: 97
  actual[1]:   120

That is usually enough to find the bad byte without dumping an entire file.

Step 5: Use Subtests For Tables

Subtests make table-style tests selectable:

test "parse integer cases" do
    var t = testing.context()

    try t.subtest("zero", do
        try testing.expect_equal[i64](0, parse_i64("0"))
    end)

    try t.subtest("negative", do
        try testing.expect_equal[i64](-7, parse_i64("-7"))
    end)
end

Run only one case:

janus test tests/parser_test.jan --only "parse integer cases/negative"

Use slash paths as stable names. Do not hide important behavior behind random test generators unless the generated seed and shrink path are reported.

Step 6: Make Authority Explicit

In :service and stricter profiles, resource helpers need test-scoped authority:

{.profile: service.}

use std.testing

test "reads through explicit filesystem cap" do
    var t = testing.context()
    let fs = t.fs_readonly("/tmp")
    const data = testing.read_file(fs, "/tmp/input.txt")
    try testing.expect(data.len >= 0)
end

This shape is deliberately wrong:

{.profile: service.}

use std.testing

test "ambient read" do
    const data = testing.read_file("/tmp/input.txt")
    _ = data
end

The runner and compiler should not smuggle filesystem authority through a path-only helper.

Step 7: Check Allocator Leaks

TestingAllocator keeps allocation accounting visible:

test "balanced allocation" do
    var alloc = testing.allocator()
    testing.record_alloc(&alloc)
    testing.record_free(&alloc)
    try testing.expect_no_leaks(&alloc)
end

The runner also checks at test end. This test fails even though its assertion passes:

test "leak is reported" do
    var alloc = testing.allocator()
    testing.record_alloc(&alloc)
    try testing.expect(true)
end

Output:

leak detected
  allocations: 1
  frees: 0
  outstanding: 1

This closes a common testing hole: forgetting to check leaks is itself a test failure.

Step 8: Write Compile-Fail Tests

Compiler features need negative tests:

test "bad message payload is rejected" do
    try testing.compile_fails(testing.CompileFailCase {
        source: "message Bad { Ref { x: *u8 } }",
        error_code: "E2530",
        message_contains: "non-SBI-conformant",
        span_contains: "Ref",
    })
end

Use error codes and required fragments. A whole diagnostic blob is too brittle as the primary contract.

Step 9: Add A Golden

Goldens are source artifacts, not a silent cache:

test "formatter output" do
    const actual = format_module(source)
    try testing.expect_golden("tests/golden/formatter/basic.out", actual)
end

Run normally to compare:

janus test tests/formatter_test.jan

Update only when you mean it:

janus test tests/formatter_test.jan --update-golden

The runner prints every changed path.

Step 10: Add A Benchmark

Benchmarks run only with --bench:

bench "parse small module" do
    var b = testing.benchmark_context()
    const source = b.read_fixture("tests/fixtures/small.jan")

    while b.keep_running() do
        _ = parse_module(source)
    end
end

Run:

janus test tests/parser_bench.jan --bench

Benchmark output includes median, p95, p99, allocation count, and bytes when available. A slow benchmark is not automatically a failing test; performance policy belongs in a separate gate.

Checklist

Before a test lands:

The test name describes behavior, not implementation.
Assertions use expected-first order.
Error behavior uses expect_error or expect_no_error.
Resource use passes a TestCtx or explicit capability.
Any allocation accounting is balanced, and the runner leak gate is green.
Compile-fail tests assert diagnostic structure, not a fragile full blob.
Golden updates require --update-golden and list changed paths.
Benchmarks are opt-in and do not pretend to be correctness tests.

The goal is not to maximize test count. The goal is to make every failure actionable.