Benchmarks

Real-world inference performance on tested models. Updated as we add support for more architectures.

Test Configuration

GPU

NVIDIA B200

Runtime

xCore v0.x

Prompt Length

512 tokens

Batch Size

1

Model	Precision	Params	Decode (tok/s)	TTFT (ms)	VRAM (GB)
Nemotron-Super-120B	NVFP4	120B (12B active)	—	—	—
Llama 3.1 70B	FP8	70B	—	—	—
Llama 3.1 8B	BF16	8B	—	—	—
Mistral 7B v0.3	FP8	7B	—	—	—
Qwen2.5 72B	NVFP4	72B	—	—	—
Phi-3 Mini 3.8B	BF16	3.8B	—	—	—

Numbers marked with — are pending publication. More models coming soon.