tpt-gpu

Rust

Hardware-agnostic, full-stack GPU compute platform — TPT Script AI-native language, TPTIR compiler, LLM inference engine with GGUF support, and optimized GEMM/Attention primitives. Runs on CUDA, ROCm, Metal, and native TPT ISA.

0 stars0 forks0 watchers
ai-acceleratorcompilergpu-computinggpu-kernelshardware-agnosticllm-inferencemachine-learningmlirrusttptir

Languages

Rust67.3%Python10.0%SystemVerilog7.8%C4.6%C++3.7%MLIR3.2%HTML1.5%Tcl0.7%Assembly0.3%PowerShell0.2%Makefile0.2%JavaScript0.1%TypeScript0.1%Stata0.1%Shell0.0%CMake0.0%
README

TPT GPU — Hardware-Agnostic Full-Stack GPU Compute Platform

License Rust Docs CI

TPT GPU is an open-source, hardware-agnostic, full-stack GPU compute platform designed for AI/ML workloads. It features TPT Script — an AI-native programming language with a minimal, orthogonal API surface that LLMs can reason over without truncation.


What's New in v1.0

  • Complete Standard Library — 200+ orthogonal operations covering tensors, neural networks, optimization, and distributed computing
  • Production-Ready Compiler — Lexer, parser, type checker with tensor shape inference, and dual codegen (Rust + TPTIR)
  • LLM Inference RuntimeGpuInferenceEngine with arch-template dispatch (LLaMA 3, Mistral, Qwen2, Phi-3, Gemma 2), sliding-window KV cache, and automatic vendor routing (CUDA → ROCm → Metal → TPTIR)
  • Shared Model Registry — GGUF models stored once in ~/.tpt/models/ and shared across all TPT tools
  • IDE Support — Full LSP server, VS Code extension, formatter, and linter
  • Framework Integration — PyTorch and JAX backends with seamless dispatch
  • AI-Assisted Kernel Generation — Automated kernel optimization and generation tools
  • Comprehensive Documentation — 17 tutorials, complete language spec, and API reference

Quick Start

Installation

# Clone the repository
git clone https://github.com/PhillipC05/tpt-gpu.git
cd tpt-gpu

# Build the TPT Script compiler
cd layer7_tptb
cargo build --release -p tptb-cli

# The binary is at: target/release/tpt

Your First TPT Script

Create hello.tpts:

import tpt

@doc("Compute the ReLU activation function")
fn relu(x: Tensor[f32, n]) -> Tensor[f32, n] {
    return tpt.relu(x)
}

Compile and check:

# Type-check
tpt check hello.tpts

# Compile to Rust + TPTIR
tpt compile hello.tpts -o output/

# List all available operations
tpt ops

# Get docs for an operation
tpt docs matmul

Building

Prerequisites

  • Rust toolchain >= 1.75 (rustup update)
  • Cargo workspace support
  • Optional: VS Code for IDE features

Build Commands

# Build all Rust layers
cargo build --release

# Build specific components
cd layer7_tptb
cargo build --release -p tptb-cli      # CLI tool
cargo build --release -p tptb-lsp      # LSP server
cargo build --release -p tptb-format   # Formatter/linter

# Run tests
cargo test --workspace

# Build with simulation mode (no hardware required)
cd layer5_tptp
cargo build --features sim

Key Features

TPT Script Language

  • Statically typed with tensor shape inference
  • Minimal API — ~200 orthogonal operations (vs PyTorch's ~2000)
  • AI-native — Every operation has machine-readable metadata (@doc, @constraint, @complexity)
  • Dual compilation — Host functions → Rust, GPU kernels → TPTIR
  • Rich annotations@requires_gpu, @distributed, @deploy, and more

LLM Inference

  • Architecture-agnostic dispatch — Add new model architectures by registering one ArchTemplate
  • Supported architectures — LLaMA 3, Mistral, Qwen2, Phi-3, Gemma 2 (GGUF format)
  • Sliding-window KV cache — Autoregressive decoding with overflow eviction
  • Automatic vendor routing — CUDA → ROCm → Metal → TPTIR fallback
  • Shared model registry — Models downloaded once to ~/.tpt/models/ via HuggingFace

Compiler Infrastructure

  • Fast compilation — Parallel Rust implementation
  • Structured errors — Error codes, locations, suggestions, and auto-fixes
  • Introspection APItpt.introspect.list_operations(), get_schema(), validate_code()
  • LSP support — Completions, hover, diagnostics, go-to-definition

Runtime & Primitives

  • Three-tier allocator — Slab → Buddy → Fallback
  • Priority scheduler — With aging to prevent starvation
  • Optimized kernels — GEMM, Attention, Conv2D, Conv3D, normalization layers
  • AI-guided optimization — Automated kernel tuning and generation

Framework Integration

  • PyTorch dispatch — Seamless backend integration
  • JAX integration — XLA-compatible primitives
  • HuggingFace support — Model loading and inference
  • Distributed training — FSDP and pipeline parallelism

Architecture

TPT GPU is organized into 7 independent layers with well-defined FFI/API boundaries:

layer1_isa/      SystemVerilog ISA — 32-bit fixed-length, 9-stage SIMT pipeline
layer2_tptd/     Kernel drivers — Linux DRM, Windows WDM, macOS DriverKit
layer3_tptc/     TPTIR compiler — MLIR-compatible dialect (C++ + Rust)
layer4_tptr/     GPU runtime — allocator, scheduler, kernel launch, LLM inference (Rust)
layer5_tptp/     GPU primitives — GEMM, Attention, Conv2D (TPTIR + Rust)
layer6_tptf/     Framework backends — PyTorch, JAX integration (Python + Rust)
layer7_tptb/     TPT Script compiler — lexer → parser → type checker → codegen (Rust)

Development flow: TPT Script (L7) → TPTIR (L3) → TPT ISA (L1) via Runtime (L4)


Tools

ToolDescriptionCommand
tptCLI compilertpt check, tpt compile, tpt run
tptb-lspLanguage ServerIDE integration
tptb-formatFormatter/Lintertptb-format fmt, tptb-format lint
model-registryShared GGUF model registrytpt-model-registry add/list/fetch
kernel-generatorAI-assisted kernel genSpec → TPTIR → validate → benchmark
kernel-optimizerAuto-tuningGrid → hill-climb → AI-guided search

Crates.io Publishing

TPT GPU components are published to crates.io for easy integration:

[dependencies]
tptb-core = "1.0"    # TPT Script compiler
tptp-core = "1.0"    # GPU primitives
tptr-core = "1.0"    # Runtime

Publish commands:

cd layer7_tptb/tptb-core && cargo publish
cd layer5_tptp/tptp-core && cargo publish
cd layer4_tptr/tptr-core && cargo publish

Roadmap

v1.0 (Current)

  • Complete standard library
  • Production-ready compiler
  • LLM inference runtime with KV cache
  • Shared GGUF model registry
  • IDE support (LSP, VS Code extension)
  • Framework integration (PyTorch, JAX)
  • AI-assisted kernel generation

v1.1 (Next)

  • REPL for interactive development
  • Enhanced error recovery
  • Performance profiling tools
  • Expanded hardware support

v2.0 (Future)

  • Custom silicon support
  • Advanced distributed computing
  • Web-based compiler playground
  • TPT Script as recommended API

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Start for Contributors

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (cargo test --workspace)
  5. Format code (cargo fmt)
  6. Commit (git commit -m 'Add amazing feature')
  7. Push (git push origin feature/amazing-feature)
  8. Open a Pull Request

Development Areas

  • Layer 1 (ISA): SystemVerilog simulation and verification
  • Layer 2 (Driver): Kernel module development (Linux, Windows, macOS)
  • Layer 3 (Compiler): TPTIR optimization passes and codegen
  • Layer 4 (Runtime): Memory management, scheduling, LLM inference
  • Layer 5 (Primitives): GPU kernel development and optimization
  • Layer 6 (Frameworks): PyTorch/JAX integration
  • Layer 7 (TPT Script): Language features and tooling

Documentation

DocumentDescription
User GuideComplete TPT Script language reference
Language SpecFormal language specification (51KB)
Tutorials17 hands-on tutorials from basics to advanced
ArchitectureDeveloper guide and build instructions
Model RegistryShared GGUF model registry format

Security

Please see SECURITY.md for security policies and reporting vulnerabilities.


License

TPT GPU is licensed under the Apache License 2.0 with LLVM Exception.

See LICENSE for the full license text.


Acknowledgments

  • Rust Community — For the amazing ecosystem and tooling
  • MLIR Project — For compiler infrastructure inspiration
  • Open Source Contributors — For making this project possible