Languages
TPT Spark — Project Forge
A lean, native, cross-platform LLM runtime. No daemons. No HTTP overhead. No proprietary AI drivers.
TPT Spark is an open-source desktop application for running Large Language Models locally.
It compiles to a single binary and runs on Windows, macOS, and Linux using standard display
drivers via wgpu — no CUDA or ROCm required.
Architecture
| Layer | Technology | Purpose |
|---|---|---|
| UI / Frontend | TypeScript + Vite | Chat interface rendered by OS WebView (WebView2 / WebKit) |
| App Core / Bridge | Rust + Tauri v2 | Window management, OS integration, IPC |
| Async Runtime | Tokio | Non-blocking, zero-copy weight streaming |
| Compute Backend | wgpu / Vulkan / Metal / DirectX 12 | GPU dispatch without CUDA |
| AI Math Engine | llama-cpp-rs (V1) / candle (V2) | Optimized GGUF inference |
Data Flow
User prompt → Tauri IPC → Rust tokenizer → wgpu GPUBuffer
→ Vulkan / Metal compute shaders → predicted token
→ IPC stream → TS frontend (word-by-word)
Getting Started
Prerequisites
- Rust 1.77+
- Node.js 18+
- Platform system libs:
- Linux:
libgtk-3-dev libwebkit2gtk-4.1-dev librsvg2-dev - macOS: Xcode Command Line Tools
- Windows: WebView2 (ships with Windows 11)
- Linux:
Run in development
npm install
npm run tauri dev
Build a release binary
npm run tauri build
The output binary lives in src-tauri/target/release/bundle/.
Adding Models
- Open the app and note the Models directory shown in the sidebar.
- Copy any
.ggufmodel file into that directory. - Click ⟳ Refresh in the sidebar.
- Select the model and click Load.
Popular GGUF sources: HuggingFace
Engine Backends
| Feature flag | Backend | Status |
|---|---|---|
| engine-stub (default) | Mock streaming | Compiles everywhere, no native deps |
| engine-llama | llama.cpp via llama-cpp-2 | Full GGUF inference |
Enable real inference:
npm run tauri build -- --features engine-llama
Requires
cmakeand a C++ compiler for the llama.cpp native build.
Roadmap
- [x] Phase 1 — Foundation: Tauri v2 project, Rust backend, IPC streaming
- [ ] Phase 2 — Engine Integration: llama-cpp-rs CPU inference
- [ ] Phase 3 — GPU Acceleration: wgpu Vulkan / Metal dispatch, VRAM management
- [ ] Phase 4 — Polish: model download manager, bundle size optimisation, v1.0 release
Why not Ollama?
Ollama wraps llama.cpp in a Go HTTP server + daemon. TPT Spark wraps it directly in Rust, stripping out the network stack, daemon process, and ~100 MB RAM overhead while keeping the same GGUF model support and CPU/GPU inference speed.
License
MIT