Now Available: NVIDIA Blackwell Optimized

The fastest way to
run AI locally

xCore the native runtime for NVIDIA Blackwell + Ampere. Peak performance on every device, from edge to rack. Zero-latency inference with unified memory optimization.

Get started See benchmarks

xanuedge-cli --stream

$ xcore serve --model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

✓ Loaded 50 CuTile kernels (3.1 MB)

✓ NVFP4 weights loaded in xxs (xx GB)

✓ Listening on http://localhost:8080 — xx tok/s decode

✔ Model ready. Metrics:

TOKENS/SEC

xxx

LATENCY (TTFT)

xx.xms

UTILIZATION

xx.x%

The fastest way to
run AI locally

Native GPU Kernels

Run Any Model, Any Precision

Deploy Anywhere

Ready for the Next Generation?

The fastest way to run AI locally

Native GPU Kernels

Run Any Model, Any Precision

Deploy Anywhere

Ready for the Next Generation?

The fastest way to
run AI locally