A 3.6B-parameter LLM running entirely on your GPU via WebGPU. No compiler, no WASM runtime, no server. Just 10 hand-written WGSL compute shaders and 792 lines of GPU code, replacing the 85 auto-generated shaders that TVM/WebLLM normally need.
Chrome or Edge only — all 10 shaders use enable f16 which Safari and Firefox don't support yet.
~2 GB model download on first load, cached after that.
| WebLLM (TVM) | This project | |
|---|---|---|
| Unique shaders | 85 | 10 |
| WGSL lines | 12,962 | 792 |
| JS bundle | 6.0 MB | 14 KB |
| Runtime | TVM + WASM | TypeScript |