Hardware LLM Performance Simulator
Realistic TTFT, decode speed, memory fit, and hardware behavior.
Device Presets
Model Presets
Backend
Manual Model Config
Simulation Results
Total VRAM
6.0 GB
Model VRAM required
7.7 GB
Fits in VRAM
No
Fits in RAM
Yes
Execution mode
CPU
Decode speed
2.3 tok/s
Prefill speed
3.4 tok/s
TTFT
1.80 s
Max context
8192
Perceived speed
Very slow
Paragraph length
120 tokens
Perceived speed
2.3 tok/s (0–80 scale)
Live Response Simulation
😄
How fast will this model feel on my hardware?
🤖
Run a simulation to see a live response.
Compare Any Two Devices
Select two hardware presets and compare their performance for the same model and backend.
Left device
Model + backend
Right device
Desktop — RTX 4090 + i9-13900K
TTFT0.86 sDecode83.0 tok/sPrefill124.4 tok/sFits VRAM✔Fits RAM✔ModeGPUContext49152
Desktop — RTX 4080 + Ryzen 9 7900X
TTFT0.86 sDecode61.7 tok/sPrefill92.5 tok/sFits VRAM✔Fits RAM✔ModeGPUContext32768
Benchmark All Devices
Run a full performance simulation across every hardware preset.
Model
Backend
Performance Charts
Visualize how hardware, model size, and backend affect performance.
Decode Speed vs Model Size
VRAM vs Max Context
TTFT vs Model Size
Backend Comparison
Export / Import Profile
Save your current hardware profile or load a custom one.
Current profile snapshot:
{
"cpuVendor": "Intel",
"cpuModel": "Core i7-12700H",
"cpuCores": 6,
"cpuThreads": 12,
"cpuBaseGHz": 2.3,
"cpuBoostGHz": 4.7,
"cpuTdpWatts": 45,
"hasAVX2": true,
"hasAVX512": false,
"ramGB": 16,
"ramType": "DDR5",
"ramSpeedMT": 4800,
"gpuVendor": "NVIDIA",
"gpuModel": "RTX 3060 Laptop",
"gpuArchitecture": "Ampere",
"vramGB": 6,
"vramType": "GDDR6",
"memoryBandwidthGBs": 192,
"tflopsFP16": 20,
"tflopsFP32": 10,
"tflopsINT8": 40,
"os": "Windows",
"isLaptop": true,
"coolingClass": "Thin",
"pcieGen": 4
}