Benchmarks
Real-world inference performance on tested models. Updated as we add support for more architectures.
Test Configuration
GPU
NVIDIA B200
Runtime
xCore v0.x
Prompt Length
512 tokens
Batch Size
1
| Model | Precision | Params | Decode (tok/s) | TTFT (ms) | VRAM (GB) |
|---|---|---|---|---|---|
| Nemotron-Super-120B | NVFP4 | 120B (12B active) | — | — | — |
| Llama 3.1 70B | FP8 | 70B | — | — | — |
| Llama 3.1 8B | BF16 | 8B | — | — | — |
| Mistral 7B v0.3 | FP8 | 7B | — | — | — |
| Qwen2.5 72B | NVFP4 | 72B | — | — | — |
| Phi-3 Mini 3.8B | BF16 | 3.8B | — | — | — |
Numbers marked with — are pending publication. More models coming soon.