What If You Could Compile a Model to a $4 Microcontroller?

Most AI deployment assumes a GPU somewhere in the chain. Cloud inference: GPU server. Edge inference: Jetson or Raspberry Pi with a USB accelerator. Desktop inference: consumer GPU or CPU fast enough to run quantised weights. The assumed compute primitive is always a processor running matrix multiplications in floating point.

There are scenarios where that assumption breaks. An FPGA running at the edge of a power grid where latency matters more than flexibility. An analog circuit where power consumption needs to be measured in microwatts. A swarm of RP2040 microcontrollers deployed to a location where replacing them means a physical visit.

tpt-crucible is a compiler for those scenarios. It takes standard model formats — GGUF, ONNX, PyTorch, TensorFlow — and compiles them onto hardware that GPUs were never designed for.

Three Hardware Targets

FPGAs. The Fusion module generates synthesisable Verilog via Amaranth HDL, then drives Yosys for logic synthesis and Nextpnr for place-and-route. Currently targets Xilinx Alveo boards. The output is a bitstream that can be flashed directly to the board.

Analog circuits. The Element module maps model weights to physical components — resistors, memristors, op-amps — and produces SPICE netlists for simulation via Xyce and PySpice. A lightweight PyTorch drift predictor (trained on Xyce outputs) provides fast estimates for weight drift due to temperature and component aging, without running the full SPICE simulation each time.

Microcontroller swarms. The Alloy module partitions a model graph across a swarm of microcontrollers using METIS/KaFFPa graph partitioning. It generates per-node C++ or Rust firmware and flashes it via PlatformIO or Zephyr RTOS. Supported targets include ESP32, RP2040, and RISC-V boards.

The Module Architecture

Every compilation starts with Catalyst, the ingestion layer. Catalyst reads the input model format (GGUF, ONNX, PyTorch, TensorFlow) and emits TPT-IR — a hardware-agnostic MLIR dialect. Quantisation metadata is preserved: a Q4 GGUF model's weight distribution is carried through to the hardware target, so INT4 MACs are used on targets that support them.

From TPT-IR, compilation routes to Alloy, Fusion, or Element depending on the target. The Mosaic module enables hybrid compilation — a single model spanning multiple hardware types simultaneously, with different layers mapped to different targets.

Every successful compilation produces a .tptpkg file: a ZIP container bundling the hardware-agnostic TPT-IR, compiled artifacts for each target (firmware binaries, RTL bitstreams, SPICE netlists), a pre-flight compatibility report, quantisation profiles, and partition plans.

TPT Observer: Real-Time Visualisation

tpt-crucible ships with TPT Observer, a monitoring dashboard built with a Go backend and a Next.js + Three.js frontend. It connects to hardware targets over WebSocket and streams telemetry in real time.

For microcontroller swarms, Observer renders a live 3D visualisation of the swarm topology — which nodes are active, what each node is computing, and how data is flowing between them. For FPGA targets, it shows utilisation maps. For analog targets, it shows component operating points and drift estimates.

Observer also includes an IR graph editor built on React Flow. You can inspect the TPT-IR graph produced by Catalyst, rearrange partitions, and re-trigger compilation from the UI without touching the command line.

LLM-Assisted Design via Spark IPC

Crucible includes optional LLM assistance for hardware-specific tasks: generating driver code for new boards, suggesting model topology changes for a target's constraints, providing RTL synthesis hints for the Fusion module.

The default local LLM backend is TPT Spark. When Spark is running headlessly on the same machine, Crucible connects to it over a local socket. No API key, no network call, no data leaving the machine. Cloud providers (OpenRouter, Anthropic) are available as explicit opt-in fallbacks for users without compatible hardware.

Mosaic: One Model, All Hardware

The most unusual feature of tpt-crucible is Mosaic — the hybrid orchestrator that enables a single model to span FPGA, MCU swarm, and analog hardware simultaneously.

The practical use case is latency-optimised deployment: the attention layers (small, fast, relatively high precision) run on the FPGA. The feed-forward layers (large, power-hungry) run on the analog circuit. The embedding and output layers run on the MCU swarm. Mosaic handles the inter-target communication, the synchronisation, and the hardware-specific data format conversions.

This is genuinely experimental. SPICE simulation is slow; the analog target's drift predictor exists specifically because full simulation at each inference step isn't practical. But for the scenarios where GPUs don't apply, it is a complete solution.

Getting Started

git clone https://github.com/PhillipC05/tpt-crucible
cd tpt-crucible
pip install -e ".[dev]"
# For Alloy (MCU swarm): also install PlatformIO
# For Fusion (FPGA): also install Yosys and Nextpnr

Full setup instructions, including hardware-specific toolchain requirements, are in the README.

View on GitHub