Three Tools, One Ecosystem

Most AI compute tooling is designed around a single hardware target. PyTorch assumes GPUs. Ollama assumes a local machine with enough RAM. Verilog toolchains assume you already know what hardware you're targeting. If you want to move a model from a datacenter GPU to a laptop to an FPGA, you're stitching together a different toolchain at each step.

The TPT AI compute suite takes a different approach: three open-source projects, each focused on a distinct hardware tier, designed from the start to interoperate.

tpt-gpu handles GPU compute — a vendor-agnostic platform that runs on NVIDIA, AMD, and Intel hardware without PyTorch's 2,000-operation surface area or CUDA lock-in. It includes TPT Script, a statically-typed language designed specifically so LLMs can reason over it without truncation.

tpt-spark handles the desktop — a lean Rust and Tauri LLM runtime that runs GGUF-quantised models locally with zero daemon overhead, no HTTP layer, and no background processes. A single binary, fully offline.

tpt-crucible handles the rest — an AI compiler that takes standard model formats (GGUF, ONNX, PyTorch, TensorFlow) and compiles them onto FPGAs, analog circuits, and microcontroller swarms. The hardware targets that GPUs were never designed for.

How They Connect

The three tools are independent — each one works without the others — but they share design decisions that make them composable.

All three work with GGUF-quantised models, the same format that LLaMA 3, Mistral, and Phi-3 are distributed in. Models downloaded in Spark are available to Crucible's compiler without re-downloading; the suite is designed to share a single model directory at ~/.tpt/models/.

Crucible uses Spark as its local LLM backend for AI-assisted design features — driver generation, RTL synthesis hints, topology advising. When Spark is running headlessly, Crucible connects to it over a local socket rather than calling an external API. The GPU stays on your machine; the data stays on your machine.

Both tpt-gpu and tpt-crucible use MLIR-based intermediate representations. The roadmap for both tools converges on a single shared TPTIR dialect, at which point a model compiled once to IR can be routed to GPU, FPGA, MCU swarm, or analog — without recompilation.

The Missing Layer

There has always been a gap between "run a model" and "deploy a model anywhere." The tools that run models well (Ollama, llama.cpp, vLLM) are tied to specific hardware. The tools that target exotic hardware (Vitis HLS, PlatformIO, SPICE simulators) don't know what a transformer is.

The TPT suite exists in that gap. It doesn't replace llama.cpp for CPU inference or PyTorch for training. It targets the scenarios where those tools don't apply: the laptop that needs privacy-first local inference without daemon overhead, the FPGA board running inference at the edge of a power grid, the microcontroller swarm in an environment where GPUs are impractical.

What's Also Missing (And Intentionally Left Out)

The suite does not currently include a meta-CLI orchestrator — a single tpt run command that dispatches to the right tool based on a --target flag. That would be a natural fourth piece, but it's not built yet. The three tools are each complete without it.

There is also no unified telemetry dashboard spanning all three. Crucible has Observer (a Go backend with a Next.js frontend and Three.js swarm visualisation). Spark has no monitoring. A shared telemetry schema across the three tools would enable cross-hardware benchmark comparisons, but that's a future project.

Where to Start

If you want GPU compute without vendor lock-in: tpt-gpu

If you want local LLM inference without Ollama overhead: tpt-spark

If you want to run a model on hardware that isn't a GPU: tpt-crucible

Each project has its own README with setup instructions. The individual posts below go deeper on each one.