The fastest way to
run AI locally
xCore the native runtime for NVIDIA Blackwell + Ampere. Peak performance on every device, from edge to rack. Zero-latency inference with unified memory optimization.
$ xcore serve --model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
✓ Loaded 50 CuTile kernels (3.1 MB)
✓ NVFP4 weights loaded in xxs (xx GB)
✓ Listening on http://localhost:8080 — xx tok/s decode
✔ Model ready. Metrics:
TOKENS/SEC
xxx
LATENCY (TTFT)
xx.xms
UTILIZATION
xx.x%