:compute — The Foundry
:compute — The Foundry
Section titled “:compute — The Foundry”“Hardware, meet your match.”
:compute is where Janus meets the metal. Everything from :core — plus first-class support for tensors, device streams, and hardware acceleration. Run local LLMs, do scientific computing, process video in real-time. All with the same language you use everywhere else.
What :compute Gives You
Section titled “What :compute Gives You”Tensors — N-Dimensional Arrays
Section titled “Tensors — N-Dimensional Arrays”let weights = tensor<f32, [4096, 4096]>.load("model.bin")let input = tensor<f32, [1, 4096]>.on(.vram)
let result = matmul(input, weights) .quantize(.qvl) .on(.npu)
print("Inference done in ${result.latency_ms}ms")- Shape inference — Type tracks dimensions
- Device targeting —
.on(.cpu),.on(.gpu),.on(.npu) - Quantization — QVL, INT8, FP16, BF16 support
Memory Spaces
Section titled “Memory Spaces”func process_batch(data: tensor<f32, [32, 1024]>.on(.vram)) do # This runs on GPU let result := data |> layer_norm() |> attention() |> feed_forward()
# Copy back to CPU for output return result.on(.cpu)end.on(.sram)— Fast on-chip SRAM (embedded).on(.dram)— Main system memory.on(.vram)— GPU/accelerator memory.on(.shared)— Unified memory (when available)
Device Streams
Section titled “Device Streams”# Async GPU operationslet stream := DeviceStream.on(.gpu)
stream.launch(kernel_1024, blocks: 64, threads: 256)stream.synchronize()
let output := tensor<f32, [1024]>.on(.gpu)J-IR Graph Extraction
Section titled “J-IR Graph Extraction”# Extract the compute graph before loweringlet graph := extract_jir(my_inference_func)let optimized := graph.optimize(.fusion, .constant_fold)- Optimize before hitting hardware
- Fuse operations for maximum throughput
- Constant fold at compile time
What :compute Excludes
Section titled “What :compute Excludes”| Excluded | Available In |
|---|---|
| Actors and grains | :cluster |
| Supervision trees | :cluster |
| Effects system | :sovereign |
| Raw pointers | :sovereign |
When to Use :compute
Section titled “When to Use :compute”Perfect for:
- AI inference (local LLMs, image classification, voice)
- Scientific computing (physics, chemistry, climate models)
- Signal processing and DSP
- GPU compute shaders
- Real-time video/image processing
- Matrix operations at scale
The rule: If you’re doing the same operation on thousands of data points, :compute is your friend.
Code Examples
Section titled “Code Examples”Local LLM Inference
Section titled “Local LLM Inference”func main() do let model := LLModel.load("llama-7b-q4.bin") .quantize(.q4_0) .on(.npu)
let prompt := "Write a haiku about sovereignty" let tokens := model.tokenize(prompt)
let result := model.generate(tokens, max_tokens: 100) .temperature(0.7) .on(.npu)
print(model.decode(result.tokens))endImage Processing Pipeline
Section titled “Image Processing Pipeline”func process_images(input_dir: String, output_dir: String) do let images := glob("${input_dir}/*.png")
let batch := images |> load_batch(32) |> normalize(0.0, 255.0) .on(.gpu)
let features := batch |> resize(224, 224) |> apply_model(resnet50) .on(.gpu)
let embeddings := features |> flatten() .on(.cpu)
save_embeddings(embeddings, "${output_dir}/features.npy")endScientific Simulation
Section titled “Scientific Simulation”func simulate_particles(count: usize) tensor<f32, [count, 3]> do let positions := tensor<f32, [count, 3]>.random_uniform(-10.0, 10.0) let velocities := tensor<f32, [count, 3]>.zeros()
for step in 0..1000 do # Compute forces let forces := compute_forces(positions)
# Update velocities velocities = velocities + forces * dt
# Update positions positions = positions + velocities * dt
# Boundary conditions positions = clamp(positions, -10.0, 10.0) end
return positionsendMatrix Operations
Section titled “Matrix Operations”func main() do let a := tensor<f32, [1024, 2048]>.random() let b := tensor<f32, [2048, 512]>.random()
# Matrix multiplication on GPU let c := matmul(a, b).on(.gpu)
# Element-wise operations let d := c * 2.0 + 1.0
# Reduction let sum := d.sum() print("Sum: ${sum}")endWhy :compute Wins
Section titled “Why :compute Wins”vs. Python (NumPy/PyTorch):
- Compile-time shape checking — catch dimension mismatches before running
- Zero-copy where possible — no unnecessary data movement
- Single deployment — no Python runtime, no CUDA dependencies
- Single language — same Janus for data loading, processing, serving
vs. CUDA/C++:
- Productivity — write kernels in high-level Janus
- Safety — memory spaces prevent illegal accesses
- Portability — same code targets CPU, GPU, NPU
vs. Julia:
- Stability — Janus compiles, Julia optimizes
- Deployment — static binary, no JIT warmup
- Ecosystem — same package manager as everything else
Next Steps
Section titled “Next Steps”- Move to :core — For CPU-only workloads
- Move to :sovereign — For custom kernels
- Reference: Tensors — API details
Make the metal sing.