tpt-gpu
RustHardware-agnostic, full-stack GPU compute platform — TPT Script AI-native language, TPTIR compiler, LLM inference engine with GGUF support, and optimized GEMM/Attention primitives. Runs on CUDA, ROCm, Metal, and native TPT ISA.
Languages
TPT GPU — Hardware-Agnostic Full-Stack GPU Compute Platform
TPT GPU is an open-source, hardware-agnostic, full-stack GPU compute platform designed for AI/ML workloads. It features TPT Script — an AI-native programming language with a minimal, orthogonal API surface that LLMs can reason over without truncation.
What's New in v1.0
- Complete Standard Library — 200+ orthogonal operations covering tensors, neural networks, optimization, and distributed computing
- Production-Ready Compiler — Lexer, parser, type checker with tensor shape inference, and dual codegen (Rust + TPTIR)
- LLM Inference Runtime —
GpuInferenceEnginewith arch-template dispatch (LLaMA 3, Mistral, Qwen2, Phi-3, Gemma 2), sliding-window KV cache, and automatic vendor routing (CUDA → ROCm → Metal → TPTIR) - Shared Model Registry — GGUF models stored once in
~/.tpt/models/and shared across all TPT tools - IDE Support — Full LSP server, VS Code extension, formatter, and linter
- Framework Integration — PyTorch and JAX backends with seamless dispatch
- AI-Assisted Kernel Generation — Automated kernel optimization and generation tools
- Comprehensive Documentation — 17 tutorials, complete language spec, and API reference
Quick Start
Installation
# Clone the repository
git clone https://github.com/PhillipC05/tpt-gpu.git
cd tpt-gpu
# Build the TPT Script compiler
cd layer7_tptb
cargo build --release -p tptb-cli
# The binary is at: target/release/tpt
Your First TPT Script
Create hello.tpts:
import tpt
@doc("Compute the ReLU activation function")
fn relu(x: Tensor[f32, n]) -> Tensor[f32, n] {
return tpt.relu(x)
}
Compile and check:
# Type-check
tpt check hello.tpts
# Compile to Rust + TPTIR
tpt compile hello.tpts -o output/
# List all available operations
tpt ops
# Get docs for an operation
tpt docs matmul
Building
Prerequisites
- Rust toolchain >= 1.75 (
rustup update) - Cargo workspace support
- Optional: VS Code for IDE features
Build Commands
# Build all Rust layers
cargo build --release
# Build specific components
cd layer7_tptb
cargo build --release -p tptb-cli # CLI tool
cargo build --release -p tptb-lsp # LSP server
cargo build --release -p tptb-format # Formatter/linter
# Run tests
cargo test --workspace
# Build with simulation mode (no hardware required)
cd layer5_tptp
cargo build --features sim
Key Features
TPT Script Language
- Statically typed with tensor shape inference
- Minimal API — ~200 orthogonal operations (vs PyTorch's ~2000)
- AI-native — Every operation has machine-readable metadata (
@doc,@constraint,@complexity) - Dual compilation — Host functions → Rust, GPU kernels → TPTIR
- Rich annotations —
@requires_gpu,@distributed,@deploy, and more
LLM Inference
- Architecture-agnostic dispatch — Add new model architectures by registering one
ArchTemplate - Supported architectures — LLaMA 3, Mistral, Qwen2, Phi-3, Gemma 2 (GGUF format)
- Sliding-window KV cache — Autoregressive decoding with overflow eviction
- Automatic vendor routing — CUDA → ROCm → Metal → TPTIR fallback
- Shared model registry — Models downloaded once to
~/.tpt/models/via HuggingFace
Compiler Infrastructure
- Fast compilation — Parallel Rust implementation
- Structured errors — Error codes, locations, suggestions, and auto-fixes
- Introspection API —
tpt.introspect.list_operations(),get_schema(),validate_code() - LSP support — Completions, hover, diagnostics, go-to-definition
Runtime & Primitives
- Three-tier allocator — Slab → Buddy → Fallback
- Priority scheduler — With aging to prevent starvation
- Optimized kernels — GEMM, Attention, Conv2D, Conv3D, normalization layers
- AI-guided optimization — Automated kernel tuning and generation
Framework Integration
- PyTorch dispatch — Seamless backend integration
- JAX integration — XLA-compatible primitives
- HuggingFace support — Model loading and inference
- Distributed training — FSDP and pipeline parallelism
Architecture
TPT GPU is organized into 7 independent layers with well-defined FFI/API boundaries:
layer1_isa/ SystemVerilog ISA — 32-bit fixed-length, 9-stage SIMT pipeline
layer2_tptd/ Kernel drivers — Linux DRM, Windows WDM, macOS DriverKit
layer3_tptc/ TPTIR compiler — MLIR-compatible dialect (C++ + Rust)
layer4_tptr/ GPU runtime — allocator, scheduler, kernel launch, LLM inference (Rust)
layer5_tptp/ GPU primitives — GEMM, Attention, Conv2D (TPTIR + Rust)
layer6_tptf/ Framework backends — PyTorch, JAX integration (Python + Rust)
layer7_tptb/ TPT Script compiler — lexer → parser → type checker → codegen (Rust)
Development flow: TPT Script (L7) → TPTIR (L3) → TPT ISA (L1) via Runtime (L4)
Tools
| Tool | Description | Command |
|---|---|---|
tpt | CLI compiler | tpt check, tpt compile, tpt run |
tptb-lsp | Language Server | IDE integration |
tptb-format | Formatter/Linter | tptb-format fmt, tptb-format lint |
model-registry | Shared GGUF model registry | tpt-model-registry add/list/fetch |
kernel-generator | AI-assisted kernel gen | Spec → TPTIR → validate → benchmark |
kernel-optimizer | Auto-tuning | Grid → hill-climb → AI-guided search |
Crates.io Publishing
TPT GPU components are published to crates.io for easy integration:
[dependencies]
tptb-core = "1.0" # TPT Script compiler
tptp-core = "1.0" # GPU primitives
tptr-core = "1.0" # Runtime
Publish commands:
cd layer7_tptb/tptb-core && cargo publish
cd layer5_tptp/tptp-core && cargo publish
cd layer4_tptr/tptr-core && cargo publish
Roadmap
v1.0 (Current)
- Complete standard library
- Production-ready compiler
- LLM inference runtime with KV cache
- Shared GGUF model registry
- IDE support (LSP, VS Code extension)
- Framework integration (PyTorch, JAX)
- AI-assisted kernel generation
v1.1 (Next)
- REPL for interactive development
- Enhanced error recovery
- Performance profiling tools
- Expanded hardware support
v2.0 (Future)
- Custom silicon support
- Advanced distributed computing
- Web-based compiler playground
- TPT Script as recommended API
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Quick Start for Contributors
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
cargo test --workspace) - Format code (
cargo fmt) - Commit (
git commit -m 'Add amazing feature') - Push (
git push origin feature/amazing-feature) - Open a Pull Request
Development Areas
- Layer 1 (ISA): SystemVerilog simulation and verification
- Layer 2 (Driver): Kernel module development (Linux, Windows, macOS)
- Layer 3 (Compiler): TPTIR optimization passes and codegen
- Layer 4 (Runtime): Memory management, scheduling, LLM inference
- Layer 5 (Primitives): GPU kernel development and optimization
- Layer 6 (Frameworks): PyTorch/JAX integration
- Layer 7 (TPT Script): Language features and tooling
Documentation
| Document | Description |
|---|---|
| User Guide | Complete TPT Script language reference |
| Language Spec | Formal language specification (51KB) |
| Tutorials | 17 hands-on tutorials from basics to advanced |
| Architecture | Developer guide and build instructions |
| Model Registry | Shared GGUF model registry format |
Security
Please see SECURITY.md for security policies and reporting vulnerabilities.
License
TPT GPU is licensed under the Apache License 2.0 with LLVM Exception.
See LICENSE for the full license text.
Acknowledgments
- Rust Community — For the amazing ecosystem and tooling
- MLIR Project — For compiler infrastructure inspiration
- Open Source Contributors — For making this project possible