Skip to content

Testing With Katana

Time: 55 minutes
Level: Beginner to intermediate
Prerequisites: Error handling, modules, and the janus CLI
What you’ll learn: Write tests that fail usefully, prove errors as values, avoid hidden authority, and use the Katana runner from the command line.

Katana is the Janus test harness. It is small on purpose:

  • test syntax is test "..." do ... end;
  • assertions return errors;
  • try is how tests fail;
  • expected values come before actual values;
  • resource authority is explicit outside script ergonomics.

Create tests/counter_test.jan:

use std.testing
func increment(value: usize) -> usize do
return value + 1
end
test "counter increments once" do
const actual = increment(0)
try testing.expect_equal[usize](1, actual)
end

Run it:

Terminal window
janus test tests/counter_test.jan

Expected output:

PASS T0001 counter increments once
Test Summary:
Passed: 1
Failed: 0

The important part is not the green line. The important part is the call shape:

try testing.expect_equal[usize](expected, actual)

The assertion returns TestError!void. try propagates that error to the runner. No panic is required.

Break the expectation:

test "counter increments once" do
const actual = increment(0)
try testing.expect_equal[usize](2, actual)
end

The runner reports source location and the mismatch:

FAIL T0001 counter increments once
Failures:
"counter increments once": at tests/counter_test.jan:8
value mismatch
expected: 2
actual: 1

That is the Katana standard: the failure should be small enough to act on.

Janus does not use exception trapping as the normal failure path. Test the error union:

use std.testing
error ParseError {
Empty,
}
func parse_count(input: []const u8) -> ParseError!usize do
if input.len == 0 do
fail ParseError.Empty
end
return input.len
end
test "empty input is rejected" do
const result = parse_count("")
try testing.expect_error[ParseError, usize](ParseError.Empty, result)
end
test "non-empty input returns length" do
const length = try testing.expect_no_error[ParseError, usize](parse_count("janus"))
try testing.expect_equal[usize](5, length)
end

Use expect_panic only for panic boundaries: FFI panic quarantine, compiler traps, and invariant checks. Invalid user input should normally be an error value.

Use expect_equal_slices for byte strings and arrays:

test "formatter preserves text" do
const actual = "janus"
try testing.expect_equal_slices[u8]("janus", actual)
end

When a slice differs, the runner reports length and the first differing index:

slice mismatch
length: expected 5, actual 5
first differing index: 1
expected[1]: 97
actual[1]: 120

That is usually enough to find the bad byte without dumping an entire file.

Subtests make table-style tests selectable:

test "parse integer cases" do
var t = testing.context()
try t.subtest("zero", do
try testing.expect_equal[i64](0, parse_i64("0"))
end)
try t.subtest("negative", do
try testing.expect_equal[i64](-7, parse_i64("-7"))
end)
end

Run only one case:

Terminal window
janus test tests/parser_test.jan --only "parse integer cases/negative"

Use slash paths as stable names. Do not hide important behavior behind random test generators unless the generated seed and shrink path are reported.

In :service and stricter profiles, resource helpers need test-scoped authority:

{.profile: service.}
use std.testing
test "reads through explicit filesystem cap" do
var t = testing.context()
let fs = t.fs_readonly("/tmp")
const data = testing.read_file(fs, "/tmp/input.txt")
try testing.expect(data.len >= 0)
end

This shape is deliberately wrong:

{.profile: service.}
use std.testing
test "ambient read" do
const data = testing.read_file("/tmp/input.txt")
_ = data
end

The runner and compiler should not smuggle filesystem authority through a path-only helper.

TestingAllocator keeps allocation accounting visible:

test "balanced allocation" do
var alloc = testing.allocator()
testing.record_alloc(&alloc)
testing.record_free(&alloc)
try testing.expect_no_leaks(&alloc)
end

The runner also checks at test end. This test fails even though its assertion passes:

test "leak is reported" do
var alloc = testing.allocator()
testing.record_alloc(&alloc)
try testing.expect(true)
end

Output:

leak detected
allocations: 1
frees: 0
outstanding: 1

This closes a common testing hole: forgetting to check leaks is itself a test failure.

Compiler features need negative tests:

test "bad message payload is rejected" do
try testing.compile_fails(testing.CompileFailCase {
source: "message Bad { Ref { x: *u8 } }",
error_code: "E2530",
message_contains: "non-SBI-conformant",
span_contains: "Ref",
})
end

Use error codes and required fragments. A whole diagnostic blob is too brittle as the primary contract.

Goldens are source artifacts, not a silent cache:

test "formatter output" do
const actual = format_module(source)
try testing.expect_golden("tests/golden/formatter/basic.out", actual)
end

Run normally to compare:

Terminal window
janus test tests/formatter_test.jan

Update only when you mean it:

Terminal window
janus test tests/formatter_test.jan --update-golden

The runner prints every changed path.

Benchmarks run only with --bench:

bench "parse small module" do
var b = testing.benchmark_context()
const source = b.read_fixture("tests/fixtures/small.jan")
while b.keep_running() do
_ = parse_module(source)
end
end

Run:

Terminal window
janus test tests/parser_bench.jan --bench

Benchmark output includes median, p95, p99, allocation count, and bytes when available. A slow benchmark is not automatically a failing test; performance policy belongs in a separate gate.

Before a test lands:

  • The test name describes behavior, not implementation.
  • Assertions use expected-first order.
  • Error behavior uses expect_error or expect_no_error.
  • Resource use passes a TestCtx or explicit capability.
  • Any allocation accounting is balanced, and the runner leak gate is green.
  • Compile-fail tests assert diagnostic structure, not a fragile full blob.
  • Golden updates require --update-golden and list changed paths.
  • Benchmarks are opt-in and do not pretend to be correctness tests.

The goal is not to maximize test count. The goal is to make every failure actionable.