Blueprint by nvidia

LLM Inference with SGLang

Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance

NVIDIA blueprintStationRadixAttentionStructured OutputBlackwellDGX StationInference

Open dashboard NVIDIA source