std.compute.vector

std.compute.vector is the vector-kernel layer for the :compute profile. It is the scalar/reference surface for Janus vector search: pure Janus kernels, deterministic behavior, caller-owned query/result buffers, and explicit retained storage.

The positional TurboIndex surface is available for in-memory search over retained packed codes. IdMapIndex wraps it with stable external u64 IDs and allowlist-before-heap search. Header encode/decode for .jvi and .jvim is also available. Positional .jvi indexes can save/load packed-code payloads through caller-owned byte buffers or files, and stable-ID .jvim indexes can save/load packed-code payloads plus the external-ID sidecar through caller-owned byte buffers or files. Save computes BLAKE3 payload CIDs; load verifies them and rejects tampered payloads with VectorStatus.cid_mismatch. Fixed Lloyd-Max b2/b4 tables are used for scalar quantization. Target-feature-gated neon, avx2, and avx512 grafted scoring paths are available on matching build targets.

TurboQuant and TurboVec are first-class façade types now: TurboQuant makes quantization policy explicit (b2/b4), while TurboVec provides a stable-ID ANN API over the same IdMapIndex core behavior.

Import

use std.compute.vector

The aggregate std.compute module also re-exports vector.

Current Scope

The shipped Phase 1 API covers:

Area	Functions
Scalar metrics	`dot`, `norm_f32`, `l2`, `cosine`, `normalize_f32`
Rotation	`rotate_f32`, `unrotate_f32`
Bit packing	`pack_b4`, `unpack_b4`, `pack_b2`, `unpack_b2`
Quantization	`quantize_b4`, `dequantize_b4`, `quantize_vec_b4`, `dequantize_vec_b4`, `quantize_b2`, `dequantize_b2`, `quantize_vec_b2`, `dequantize_vec_b2`, `TurboQuant`, `TurboQuantConfig`, `turbo_quant_new`, `turbo_quant_quantize`, `turbo_quant_dequantize`
Top-k heap	`topk_offer`, `topk_finalize`
Benchmark oracle	`exact_search`, `recall_at_k_milli`
Positional index	`index_init`, `index_add`, `index_search`, `index_search_report`, `index_remove`, `index_deinit`, `index_backend`
Stable-ID index	`idmap_init`, `idmap_deinit`, `idmap_add_with_ids`, `idmap_remove`, `idmap_search`, `idmap_search_report`, `idmap_search_allowlist`
TurboVec façade	`TurboVec`, `turbo_vec_init`, `turbo_vec_deinit`, `turbo_vec_as_idmap`, `turbo_vec_add`, `turbo_vec_remove`, `turbo_vec_search`, `turbo_vec_search_allowlist`, `turbo_vec_search_report`, `turbo_vec_persisted_bytes`, `turbo_vec_save_bytes`, `turbo_vec_load_bytes`, `turbo_vec_save_file`, `turbo_vec_load_file`
Search reporting	`SearchReport`
Persistence	`VectorHeader`, `VectorFileKind`, `encode_vector_header`, `decode_vector_header`, `index_persisted_bytes`, `index_save_bytes`, `index_load_bytes`, `index_save_file`, `index_load_file`, `idmap_persisted_bytes`, `idmap_save_bytes`, `idmap_load_bytes`, `idmap_save_file`, `idmap_load_file`

Metrics, rotation, packing, quantization, and top-k operate on caller-owned buffers. TurboIndex retains packed-code storage allocated during index_init through the explicit parent allocator in IndexConfig. IdMapIndex adds one retained u64 sidecar for stable external IDs. Search calls do not allocate and write results into caller-owned buffers.

Metrics

use std.compute.vector

func main() -> i32 do
    let a = [_]f32{1.0, 2.0, 3.0}
    let b = [_]f32{4.0, 5.0, 6.0}

    if vector.dot(a, b, 3) != 32.0 do return 1 end

    let v = [_]f32{3.0, 4.0}
    if vector.norm_f32(v, 2) != 5.0 do return 2 end

    let mut out = [_]f32{0.0, 0.0}
    vector.normalize_f32(v, 2, out)

    return 0
end

norm_f32 uses a local Babylonian square root implementation, not a C libm graft. That keeps the reference path sovereign and deterministic.

Rotation

let mut embedding = [_]f32{0.5, -0.25, 0.75, 1.5}

vector.rotate_f32(embedding, 4, 12345, 4)
vector.unrotate_f32(embedding, 4, 12345, 4)

Rotation is seeded and reproducible. The implementation applies deterministic Householder reflections derived from the seed and round number. The transform is norm-preserving and storage-free: unrotate_f32 regenerates the same reflection sequence in reverse.

Bit Packing

let codes = [_]u8{1, 2, 15, 0, 7, 8}
var packed: [3]u8 = undefined
var unpacked: [6]u8 = undefined

if vector.pack_b4(codes, 6, packed) != 3 do return 1 end
vector.unpack_b4(packed, 6, unpacked)

b4 packs two 4-bit codes per byte. b2 packs four 2-bit codes per byte. Tail bytes are zero-padded, and unpacking reads exactly the requested code count.

Quantization

let qv = [_]f32{-0.5, 0.0, 0.5, 0.95}
var codes: [4]u8 = undefined
let mut decoded = [_]f32{0.0, 0.0, 0.0, 0.0}

vector.quantize_vec_b4(qv, 4, codes)
vector.dequantize_vec_b4(codes, 4, decoded)

The current quantizer uses fixed Lloyd-Max scalar tables for b4 and b2. quantize_b4 maps a scalar to one of 16 reconstruction levels; quantize_b2 maps to one of 4. The tables are deterministic compile-time constants, so the scalar reference stays training-free and reproducible.

TurboQuant exposes explicit quantization policy under a typed façade:

use std.compute.vector

var qcfg = vector.TurboQuantConfig { kind: vector.TurboQuantKind.b4 }
let q = vector.turbo_quant_new(&qcfg)
let vector_data = [_]f32{-0.5, 0.0, 0.5, 0.95}
var codes: [4]u8 = undefined
var decoded: [4]f32 = undefined

vector.turbo_quant_quantize(&q, vector_data, 4, codes)
vector.turbo_quant_dequantize(&q, codes, 4, decoded)

Top-k

var scores: [3]f32 = undefined
var ids: [3]u64 = undefined
var len: u32 = 0

len = vector.topk_offer(scores, ids, len, 3, 0.5, 10)
len = vector.topk_offer(scores, ids, len, 3, 0.875, 20)
len = vector.topk_offer(scores, ids, len, 3, 0.125, 30)
len = vector.topk_offer(scores, ids, len, 3, 0.75, 40)

vector.topk_finalize(scores, ids, len)

The heap keeps the largest score values. Callers using L2 distance should pass -distance as the score key so “larger is better” remains true internally. After topk_finalize, the arrays are sorted in descending score order and are no longer a heap.

Benchmark Oracle

exact_search runs a full-precision top-k scan over caller-owned raw [count][dim]f32 storage. It is not an index; it is the correctness oracle for recall measurements and small benchmark harnesses.

let vectors = [_]f32{
    1.0, 0.0,
    0.0, 1.0,
    0.75, 0.75,
}
let query = [_]f32{1.0, 0.0}
var exact_scores: [2]f32 = undefined
var exact_ids: [2]u64 = undefined

let exact_found = vector.exact_search(
    vectors,
    3,
    2,
    vector.Metric.cosine,
    query,
    2,
    exact_scores,
    exact_ids,
)

recall_at_k_milli compares approximate result IDs against exact baseline IDs as sets within the first k entries and returns milli-units: 1000 means recall 1.0, 500 means recall 0.5. The helper is allocation-free and order-insensitive; ranking quality beyond membership remains a benchmark-harness concern.

Positional TurboIndex

TurboIndex stores packed quantized vectors and returns positional slot IDs. Slots are not stable across removal: index_remove uses swap_remove, so the last vector can move into the removed position.

use std.compute.vector
use std.alloc.page_allocator

func main() -> i32 do
    var pa = page_allocator.init()
    var cfg = vector.IndexConfig {
        dim: 2,
        bit_width: vector.BitWidth.b4,
        metric: vector.Metric.cosine,
        backend: vector.Backend.cpu_auto,
        seed: 99,
        capacity: 4,
        rounds: 0,
        allocator: &pa,
    }

    var idx: vector.TurboIndex = undefined
    if vector.index_init(&idx, &cfg) != vector.VectorStatus.ok do
        return 1
    end

    let vectors = [_]f32{
        1.0, 0.0,
        0.0, 1.0,
        0.75, 0.75,
    }
    if vector.index_add(&idx, vectors, 3) != vector.VectorStatus.ok do
        vector.index_deinit(&idx)
        return 2
    end

    let query = [_]f32{1.0, 0.0}
    var scores: [2]f32 = undefined
    var ids: [2]u64 = undefined

    let found = vector.index_search(&idx, query, 2, scores, ids)

    vector.index_deinit(&idx)
    return 0
end

IndexConfig.capacity preallocates retained packed-code storage using IndexConfig.allocator. rounds controls the seeded Householder rotation count; 0 is valid for deterministic test cases that isolate index semantics. The smoke harness asserts that retained-storage accounting returns to zero after index_deinit, and that a second index_deinit call remains inert.

index_backend reports the selected backend. cpu_auto selects the best compiled target-supported grafted scorer in this order: avx512, avx2, neon, then scalar. Explicit neon, avx2, and avx512 indexes use grafted Zig vector-lane scoring kernels for dot/L2 candidate scoring when the compiled target advertises the matching feature; otherwise index_init returns VectorStatus.backend_unavailable. The smoke harness builds matching cpu_auto, explicit scalar, and target-available explicit SIMD indexes and asserts result count, IDs, and scores match, keeping accelerated backends tied to the scalar oracle.

SearchReport is the stable reporting payload for benchmark harnesses. index_search_report and idmap_search_report run the same search as their plain counterparts, then fill backend, metric, bit width, dimension, vector count, visited count, requested k, found count, code payload bytes, raw [dim]f32 payload bytes, and compression_ratio_milli. Pair these reports with exact_search and recall_at_k_milli to produce recall@k numbers against the scalar full-precision oracle. Latency remains caller-measured so the stdlib does not fabricate a clock abstraction.

Stable-ID IdMapIndex

IdMapIndex keeps external u64 IDs stable across deletion. Internally it uses the same positional packed-code store, then repairs the ID sidecar after swap_remove.

use std.compute.vector
use std.alloc.page_allocator

func main() -> i32 do
    var pa = page_allocator.init()
    var cfg = vector.IndexConfig {
        dim: 2,
        bit_width: vector.BitWidth.b4,
        metric: vector.Metric.cosine,
        backend: vector.Backend.scalar,
        seed: 99,
        capacity: 4,
        rounds: 0,
        allocator: &pa,
    }

    var idx: vector.IdMapIndex = undefined
    if vector.idmap_init(&idx, &cfg) != vector.VectorStatus.ok do
        return 1
    end

    let vectors = [_]f32{
        1.0, 0.0,
        0.0, 1.0,
        0.75, 0.75,
    }
    let external_ids = [_]u64{101, 202, 303}
    if vector.idmap_add_with_ids(&idx, vectors, external_ids, 3) != vector.VectorStatus.ok do
        vector.idmap_deinit(&idx)
        return 2
    end

    let query = [_]f32{0.75, 0.75}
    let allow = [_]u64{101, 303}
    var scores: [2]f32 = undefined
    var ids: [2]u64 = undefined

    let found = vector.idmap_search_allowlist(&idx, query, 2, allow, 2, scores, ids)

    vector.idmap_deinit(&idx)
    return 0
end

idmap_search and idmap_search_allowlist return external IDs, never positional slots. idmap_search_allowlist filters candidates before they are offered to the top-k heap, so a disallowed candidate cannot evict an allowed result. idmap_add_with_ids rejects duplicate external IDs with VectorStatus.duplicate_id; stable IDs are unique, not a multimap.

IdMapIndex stores the ID sidecar beside the inner positional index. Its lifecycle follows the same rule: initialize with idmap_init, release with idmap_deinit. The vector smoke harness verifies that both the inner index and the sidecar clear retained-storage accounting after idmap_deinit, and that a second idmap_deinit call does not free again.

TurboVec façade

TurboVec is a stable-ID first-class wrapper over IdMapIndex for ANN-style storage and query:

use std.compute.vector
use std.alloc.page_allocator

var pa = page_allocator.init()
var cfg = vector.IndexConfig {
    dim: 2,
    bit_width: vector.BitWidth.b4,
    metric: vector.Metric.cosine,
    backend: vector.Backend.scalar,
    seed: 99,
    capacity: 4,
    rounds: 0,
    allocator: &pa,
}

var tv: vector.TurboVec = undefined
if vector.turbo_vec_init(&tv, &cfg) != vector.VectorStatus.ok do
    return 1
end

let vectors = [_]f32{
    1.0, 0.0,
    0.0, 1.0,
    0.75, 0.75,
}
let ids = [_]u64{101, 202, 303}
vector.turbo_vec_add(&tv, vectors, ids, 3)

let query = [_]f32{0.75, 0.75}
var scores: [2]f32 = undefined
var result_ids: [2]u64 = undefined
let found = vector.turbo_vec_search(&tv, query, 2, scores, result_ids)

vector.turbo_vec_remove(&tv, 101)
vector.turbo_vec_deinit(&tv)

turbo_vec_search_allowlist preserves the existing early-filter behavior and turbo_vec_as_idmap exposes the underlying IdMapIndex for existing lower-level interop when needed.

Persistence Headers

.jvi is the positional vector-index format. .jvim is the stable-ID vector index map format; it carries the same base header plus idmap_payload_cid for the external-ID sidecar payload.

var header = vector.VectorHeader {
    kind: vector.VectorFileKind.jvim,
    version: vector.VECTOR_FORMAT_VERSION,
    dim: 1536,
    bit_width: vector.BitWidth.b4,
    metric: vector.Metric.cosine,
    backend_hint: vector.Backend.scalar,
    vector_count: 42,
    rotation_seed: 0x0123456789ABCDEF,
    rotation_rounds: 6,
    quantizer_kind: vector.VECTOR_QUANTIZER_TURBOQUANT,
    code_payload_cid: code_cid,
    norms_payload_cid: norms_cid,
    idmap_payload_cid: idmap_cid,
}

var bytes: [144]u8 = undefined
let status = vector.encode_vector_header_raw(bytes, 144, &header)

VECTOR_HEADER_BYTES is 144. Multi-byte scalars are little-endian. The header includes rotation_rounds so loaded indexes reproduce the original rotation.

For positional indexes, index_persisted_bytes, index_save_bytes, and index_load_bytes save and reload the .jvi header plus packed-code payload in caller-owned byte buffers. index_save_bytes computes the packed-code payload CID from the serialized bytes; index_load_bytes recomputes and verifies it before allocating the reopened index. The byte APIs use explicit pointer+length arguments, for example index_save_bytes(&idx, bytes, cap) and index_load_bytes(&idx, &pa, bytes, len).

index_save_file and index_load_file provide the same .jvi contract over a filesystem path. File I/O failures return VectorStatus.io_error; malformed or wrong-kind contents still return format_mismatch, and CID mismatches still return cid_mismatch.

For stable-ID indexes, idmap_persisted_bytes, idmap_save_bytes, and idmap_load_bytes save and reload the .jvim header, packed-code payload, and little-endian u64 ID sidecar in caller-owned byte buffers. idmap_save_bytes computes CIDs for both the code payload and the ID sidecar; idmap_load_bytes verifies both before allocation. The byte APIs use explicit pointer+length arguments, for example idmap_save_bytes(&idx, bytes, cap) and idmap_load_bytes(&idx, &pa, bytes, len).

idmap_save_file and idmap_load_file provide the same .jvim contract over a filesystem path.

Verification

The proof gate is the AOT smoke harness:

cd janus
./scripts/zb test-vector

The broad test step also depends on this harness:

cd janus
./scripts/zb test

The harness lives at std/compute/vector_smoke.jan and covers metrics, normalization, b2/b4 packing, Lloyd-Max quantization, seeded rotation/unrotation, top-k ordering, exact full-precision search, recall@k milli scoring, positional TurboIndex add/search/remove behavior, and stable IdMapIndex add/search/remove/allowlist behavior. It also round-trips .jvi and .jvim headers through encode_vector_header / decode_vector_header, then saves and reloads positional .jvi and stable-ID .jvim payloads through both caller-owned byte buffers and files, and proves search still works. It also corrupts one .jvi code byte and one .jvim ID-sidecar byte and expects VectorStatus.cid_mismatch. It also checks retained-storage accounting before and after index_deinit / idmap_deinit so the no-leak/no-double-free contract is covered by the same proof gate. Backend coverage currently proves automatic cpu_auto dispatch/reporting: supported targets select avx512, avx2, or neon before falling back to scalar, and unavailable explicit backends report backend_unavailable. It also compares cpu_auto and target-available explicit SIMD results against an explicit scalar index, so accelerated scoring remains tied to the scalar oracle. Bridge compile-only checks cover the AVX512 and NEON target-feature builds where this host cannot execute those instructions. It also checks SearchReport fields for positional and stable-ID search, including backend, visited count, result count, and compression-ratio payload data.

Limits

Not shipped yet:

Native Janus simd[T; N] reimplementation, pending SPEC-040.
GPU/NPU/device vector backends.

Treat the current module as the stable scalar reference layer, not as a complete vector database.