NVIDIA catalog
Models, skills and blueprints for GPU jobs.
Browse NVIDIA workloads inside ICPX before creating a compute job.
nvidia
Active Speaker Detection
Detect and track speaker identities across video frames.
deepmind
alphafold2
Predicts the 3D structure of a protein from its amino acid sequence.
deepmind
alphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.
sqwh1lyrveic
AODT 1.2.1
AODT 1.2.1
sqwh1lyrveic
AODT 1.2.2
AODT 1.2.2
nvidia
Background Noise Removal
Removes unwanted noises from audio improving speech intelligibility.
nvidia
bevformer
Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
baai
bge-m3
Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
mit
Boltz-2
Predict complex structures using Boltz-2.
nvidia
canary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.
resembleai
chatterbox-multilingual-tts
Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
nvidia
conformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
nvidia
cosmos-reason2-8b
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cosmos-transfer1-7b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos-transfer2.5-2b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos3-nano
Generates physics-aware videos from text prompts or an image prompt for physical AI development.
nvidia
cosmos3-nano-reasoner
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cuopt
World-record accuracy and performance for complex route optimization.
deepseek-ai
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
deepseek-ai
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
mit
diffdock
Predicts the 3D structure of how a molecule interacts with a protein.
diffusiongemma-26b-a4b-it
Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
abacusai
dracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
meta
esm2-650m
Generates embeddings of proteins from their amino acid sequences.
meta
esmfold
Predicts the 3D structure of a protein from its amino acid sequence.
arc
evo2-40b
Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
nvidia
eyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.
cadence
fidelity
Run computational-fluid dynamics (CFD) simulations
ansys
fluent
Run computational-fluid dynamics (CFD) simulations
black-forest-labs
FLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
black-forest-labs
FLUX.1-Kontext-dev
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
black-forest-labs
FLUX.1-schnell
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
black-forest-labs
flux.2-klein-4b
FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
nvidia
fourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
gemma-2-2b-it
Advanced small language generative AI model for edge applications
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
nvidia
genmol
Fragment-Based Molecular Generation by Discrete Diffusion.
nvidia
gliner-pii
GLiNER PII detects Personally Identifiable Information in text.
z-ai
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
openai
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
openai
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
nvidia
ising-calibration-1-35b-a3b
Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
moonshotai
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
nvidia
LipSync
Generative lip dubbing that syncs lips in a video to input audio.
meta
llama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
meta
llama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
nvidia
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nvidia
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nvidia
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
nvidia
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
nvidia
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
meta
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
meta
llama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
meta
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
nvidia
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
nvidia
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
meta
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
meta
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
nvidia
llama-nemotron-embed-1b-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
nvidia
llama-nemotron-embed-vl-1b-v2
Multimodal question-answer retrieval representing user queries as text and documents as images.
nvidia
llama-nemotron-rerank-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
llama-nemotron-rerank-vl-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
magpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
nvidia
magpie-tts-zeroshot
Expressive and engaging text-to-speech, generated from a short audio sample.
nvidia
megatron-1b-nmt
Enable smooth global interactions in 36 languages.
minimaxai
minimax-m2.7
MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
minimaxai
minimax-m3
MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
mistralai
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
mistralai
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
mistralai
mistral-medium-3.5-128b
A high performing model for text generation, coding and agentic use cases
mistralai
mistral-nemotron
Built for agentic workflows, this model excels in coding, instruction following, and function calling
mistralai
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
mistralai
mixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
nvidia
molmim
MolMIM performs controlled generation, finding molecules with the right properties.
colabfold
msa-search
Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
nvidia
nemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nvidia
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemoretriever-page-elements-v2
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-3-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
nvidia
nemotron-3-nano-omni-30b-a3b-reasoning
Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
nvidia
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3-ultra-550b-a55b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3.5-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-asr-streaming
Real-time speech recognition for English
nvidia
nemotron-content-safety-reasoning-4b
A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
nvidia
nemotron-graphic-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
nvidia
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
nvidia
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemotron-page-elements-v3
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-table-structure-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-voicechat
Nemotron 3 Voicechat
nvidia
nv-embed-v1
Generates high-quality numerical embeddings from text inputs.
nvidia
nv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
nvidia
nv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
nvidia
nv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
openfold
openfold2
Predicts the 3D structure of a protein from its amino acid sequence, multiple sequence alignments, and templates.
openfold
openfold3
OpenFold3 is a third-generation biomolecular foundation model that predicts the three-dimensional structures of molecular complexes (proteins, DNA, RNA, ligands)
baidu
paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
nvidia
parakeet-1.1b-rnnt-multilingual-asr
High accuracy and optimized performance for transcription in 25 languages
nvidia
parakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.
nvidia
parakeet-ctc-0.6b-es
Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-vi
Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-zh-cn
Record-setting accuracy and performance for Mandarin English transcriptions.
nvidia
parakeet-ctc-0.6b-zh-tw
Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.
nvidia
parakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.
nvidia
parakeet-tdt-0.6b-v2
Accurate and optimized English transcriptions with punctuation and word timestamps
microsoft
phi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoft
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
ipd
proteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
qwen
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
qwen
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
qwen
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
qwen
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
qwen
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
nvidia
Relighting
Re-illuminate people in video to match target lighting from a 360 HDRI environment map.
nvidia
rerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
ipd
rfdiffusion
A generative model of protein backbones for protein binder design.
nvidia
riva-translate-1.6b
Enable smooth global interactions in 36 languages.
nvidia
riva-translate-4b-instruct-v1_1
Translation model in 12 languages with few-shots example prompts capability.
sarvamai
sarvam-m
Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
bytedance
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
siemens
simcenter-star-ccm+
Run computational-fluid dynamics (CFD) simulations
upstage
solar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
nvidia
sparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
cadence
spectre-x
Run large-scale electronics and chip design verification simulations
stabilityai
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
stepfun-ai
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
stepfun-ai
step-3.7-flash
A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.
stockmark
stockmark-2-100b-instruct
Japanese-specialized large-language-model for enterprises to read and understand complex business documents.
nvidia
streampetr
StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
nvidia
Studio Voice
Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
nvidia
synthetic-video-detector
NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI‑generated (synthetic) videos.
0615409268808334
test_endpoint_20251218_133732_563_ouy_canary
For publishing test
microsoft
TRELLIS
MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
nvidia
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
openai
whisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.
nvidia
Active Speaker Detection
Detect and track speaker identities across video frames.
deepmind
alphafold2
Predicts the 3D structure of a protein from its amino acid sequence.
deepmind
alphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.
sqwh1lyrveic
AODT 1.2.1
AODT 1.2.1
sqwh1lyrveic
AODT 1.2.2
AODT 1.2.2
nvidia
Background Noise Removal
Removes unwanted noises from audio improving speech intelligibility.
nvidia
bevformer
Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
baai
bge-m3
Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
mit
Boltz-2
Predict complex structures using Boltz-2.
nvidia
canary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.
resembleai
chatterbox-multilingual-tts
Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
nvidia
conformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
nvidia
cosmos-reason2-8b
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cosmos-transfer1-7b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos-transfer2.5-2b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos3-nano
Generates physics-aware videos from text prompts or an image prompt for physical AI development.
nvidia
cosmos3-nano-reasoner
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cuopt
World-record accuracy and performance for complex route optimization.
deepseek-ai
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
deepseek-ai
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
mit
diffdock
Predicts the 3D structure of how a molecule interacts with a protein.
diffusiongemma-26b-a4b-it
Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
abacusai
dracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
meta
esm2-650m
Generates embeddings of proteins from their amino acid sequences.
meta
esmfold
Predicts the 3D structure of a protein from its amino acid sequence.
arc
evo2-40b
Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
nvidia
eyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.
cadence
fidelity
Run computational-fluid dynamics (CFD) simulations
ansys
fluent
Run computational-fluid dynamics (CFD) simulations
black-forest-labs
FLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
black-forest-labs
FLUX.1-Kontext-dev
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
black-forest-labs
FLUX.1-schnell
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
black-forest-labs
flux.2-klein-4b
FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
nvidia
fourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
gemma-2-2b-it
Advanced small language generative AI model for edge applications
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
nvidia
genmol
Fragment-Based Molecular Generation by Discrete Diffusion.
nvidia
gliner-pii
GLiNER PII detects Personally Identifiable Information in text.
z-ai
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
openai
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
openai
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
nvidia
ising-calibration-1-35b-a3b
Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
moonshotai
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
nvidia
LipSync
Generative lip dubbing that syncs lips in a video to input audio.
meta
llama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
meta
llama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
nvidia
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nvidia
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nvidia
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
nvidia
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
nvidia
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
meta
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
meta
llama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
meta
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
nvidia
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
nvidia
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
meta
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
meta
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
nvidia
llama-nemotron-embed-1b-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
nvidia
llama-nemotron-embed-vl-1b-v2
Multimodal question-answer retrieval representing user queries as text and documents as images.
nvidia
llama-nemotron-rerank-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
llama-nemotron-rerank-vl-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
magpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
nvidia
magpie-tts-zeroshot
Expressive and engaging text-to-speech, generated from a short audio sample.
nvidia
megatron-1b-nmt
Enable smooth global interactions in 36 languages.
minimaxai
minimax-m2.7
MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
minimaxai
minimax-m3
MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
mistralai
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
mistralai
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
mistralai
mistral-medium-3.5-128b
A high performing model for text generation, coding and agentic use cases
mistralai
mistral-nemotron
Built for agentic workflows, this model excels in coding, instruction following, and function calling
mistralai
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
mistralai
mixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
nvidia
molmim
MolMIM performs controlled generation, finding molecules with the right properties.
colabfold
msa-search
Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
nvidia
nemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nvidia
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemoretriever-page-elements-v2
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-3-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
nvidia
nemotron-3-nano-omni-30b-a3b-reasoning
Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
nvidia
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3-ultra-550b-a55b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3.5-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-asr-streaming
Real-time speech recognition for English
nvidia
nemotron-content-safety-reasoning-4b
A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
nvidia
nemotron-graphic-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
nvidia
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
nvidia
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemotron-page-elements-v3
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-table-structure-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-voicechat
Nemotron 3 Voicechat
nvidia
nv-embed-v1
Generates high-quality numerical embeddings from text inputs.
nvidia
nv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
nvidia
nv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
nvidia
nv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
openfold
openfold2
Predicts the 3D structure of a protein from its amino acid sequence, multiple sequence alignments, and templates.
openfold
openfold3
OpenFold3 is a third-generation biomolecular foundation model that predicts the three-dimensional structures of molecular complexes (proteins, DNA, RNA, ligands)
baidu
paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
nvidia
parakeet-1.1b-rnnt-multilingual-asr
High accuracy and optimized performance for transcription in 25 languages
nvidia
parakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.
nvidia
parakeet-ctc-0.6b-es
Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-vi
Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-zh-cn
Record-setting accuracy and performance for Mandarin English transcriptions.
nvidia
parakeet-ctc-0.6b-zh-tw
Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.
nvidia
parakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.
nvidia
parakeet-tdt-0.6b-v2
Accurate and optimized English transcriptions with punctuation and word timestamps
microsoft
phi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoft
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
ipd
proteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
qwen
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
qwen
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
qwen
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
qwen
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
qwen
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
nvidia
Relighting
Re-illuminate people in video to match target lighting from a 360 HDRI environment map.
nvidia
rerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
ipd
rfdiffusion
A generative model of protein backbones for protein binder design.
nvidia
riva-translate-1.6b
Enable smooth global interactions in 36 languages.
nvidia
riva-translate-4b-instruct-v1_1
Translation model in 12 languages with few-shots example prompts capability.
sarvamai
sarvam-m
Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
bytedance
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
siemens
simcenter-star-ccm+
Run computational-fluid dynamics (CFD) simulations
upstage
solar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
nvidia
sparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
cadence
spectre-x
Run large-scale electronics and chip design verification simulations
stabilityai
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
stepfun-ai
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
stepfun-ai
step-3.7-flash
A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.
stockmark
stockmark-2-100b-instruct
Japanese-specialized large-language-model for enterprises to read and understand complex business documents.
nvidia
streampetr
StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
nvidia
Studio Voice
Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
nvidia
synthetic-video-detector
NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI‑generated (synthetic) videos.
0615409268808334
test_endpoint_20251218_133732_563_ouy_canary
For publishing test
microsoft
TRELLIS
MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
nvidia
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
openai
whisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.
nvidia
accelerated-computing-cudf
Official NVIDIA-authored guidance for NVIDIA cuDF GPU DataFrames, pandas acceleration, dask-cuDF, ETL, joins, groupby, CSV/Parquet I/O, nullable semantics, and multi-GPU DataFrame workloads.
nvidia
aiq-deploy
Use when asked to install, deploy, run, validate, troubleshoot, or stop NVIDIA AI-Q Blueprint infrastructure.
nvidia
aiq-research
Use when asked to run deep research or AI-Q research through a reachable NVIDIA AI-Q Blueprint backend.
nvidia
cudaq-guide
CUDA-Q onboarding guide for installation, test programs, GPU simulation, QPU hardware, and quantum applications.
nvidia
cufolio
Use when a user asks to build, optimize, backtest, rebalance, or analyze a stock portfolio with Mean-CVaR, efficient frontiers, scenario generation, or NVIDIA cuOpt.
nvidia
cuopt-developer
Modify, build, test, debug, and contribute to NVIDIA cuOpt (C++/CUDA, Python, server, CI). Use for solver internals, PRs, DCO, and code conventions.
nvidia
cuopt-install
Install cuOpt for Python, C, or server via pip, conda, or Docker; verify the install. For building cuOpt from source, see cuopt-developer.
nvidia
cuopt-numerical-optimization-api-c
LP, MILP, and QP (beta) with cuOpt — C API only. Use when the user is embedding LP, MILP, or QP in C/C++.
nvidia
cuopt-numerical-optimization-api-cli
LP, MILP, and QP (beta) with cuOpt — CLI only (MPS files, cuopt_cli). Use when the user is solving LP, MILP, or QP from MPS via command line.
nvidia
cuopt-numerical-optimization-api-python
Solve LP, MILP, QP (beta) with cuOpt Python API — linear/quadratic objectives, integer variables, scheduling, portfolio, least squares.
nvidia
cuopt-numerical-optimization-formulation
LP, MILP, QP — concepts, problem-text parsing, and formulation patterns (parameters, constraints, decisions, objective). Concepts only; no API.
nvidia
cuopt-routing-api-python
Vehicle routing (VRP, TSP, PDP) with cuOpt — Python API only. Use when the user is building or solving routing in Python.
nvidia
cuopt-routing-formulation
Vehicle routing (VRP, TSP, PDP) — problem types and data requirements. Domain concepts; no API or interface.
nvidia
cuopt-server-api-python
cuOpt REST server — start server, endpoints, Python/curl client examples. Use when the user is deploying or calling the REST API.
nvidia
cuopt-server-common
cuOpt REST server — what it does and how requests flow. Domain concepts; no deploy or client code.
nvidia
cuopt-skill-evolution
After solving a non-trivial problem, detect generalizable learnings and propose skill updates. Always active — applies to every interaction.
nvidia
cuopt-user-rules
Base rules for end users calling NVIDIA cuOpt (routing/LP/MILP/QP/install/server). Not for cuOpt internals — use cuopt-developer for those.
nvidia
cupynumeric-hdf5
Read and write large cuPyNumeric arrays to HDF5 with Legate's parallel, distributed HDF5 I/O (legate.io.hdf5: to_file, from_file, from_file_batched). Use when a developer needs to save a cuPyNumeric array to an .h5/.hdf5 file, load an HDF5 dataset into a
nvidia
cupynumeric-install
Install and verify cuPyNumeric for Python — requirements, commands, verification. Source builds are out of scope.
nvidia
cupynumeric-migration-readiness
Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must b
nvidia
cupynumeric-parallel-data-load
Load a sharded, on-disk dataset (sharded .npy, Parquet/Arrow, raw binary, sharded HDF5, custom layouts) into a distributed cuPyNumeric ndarray via a manual partition + leaf @task launch with CPU/OMP/GPU variants. Use when no single-call loader fits, inclu
nvidia
dali-dynamic-mode
DALI imperative dynamic mode (`nvidia.dali.experimental.dynamic`, ndd): use when working on ndd code or migrating pipelines; skip pipeline-only tasks.
nvidia
data-designer
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
nvidia
deepstream-dev
NVIDIA DeepStream SDK 9.0 development with Python pyservicemaker API. Use when building video analytics pipelines, GStreamer-based video processing, TensorRT inference integration, object detection/tracking, or Kafka/message broker integration.
nvidia
deepstream-import-vision-model
Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report.
nvidia
dicom-metadata-extract
Used for extracting selected metadata from one DICOM file and flagging standard-tag PHI presence. Not for anonymization or clinical use.
nvidia
dicom-series-preflight
Used for header-only preflight of one DICOM series folder before conversion or inference. Not for de-identification or clinical clearance.
nvidia
dicom-series-to-volume
Used for converting one CT DICOM series folder to a HU NIfTI volume with affine evidence. Not for multi-frame DICOM or clinical use.
nvidia
digital-health-clinical-asr-build
Stage 2 of the Clinical ASR Flywheel. Use when curating clinical terms, tagging IPA, and synthesizing a NeMo manifest. NOT for scoring (use /digital-health-clinical-asr-eval).
nvidia
digital-health-clinical-asr-eval
Stage 3 of Clinical ASR Flywheel. Score a NeMo manifest, produce the five-section KER leaderboard (by-ipa_source diagnostic). Not for ASR auth (/riva-asr).
nvidia
digital-health-clinical-asr-finetune
Stage 4 of the Clinical ASR Flywheel. Use when priority KER is above 0.3 to run stock NeMo SFT on Parakeet TDT v2 and offline cycle N+1 re-eval. NOT for generic word boosting (use /finetune-asr).
nvidia
digital-health-clinical-asr-setup
Stage 1 of Clinical ASR Flywheel. Use when bootstrapping a cycle: NVCF+MW disclosure, NVIDIA_API_KEY check, deps install, TTS+ASR smoke test.
nvidia
dynamo-interconnect-check
Validate that a Dynamo deployment's NIXL/UCX/NCCL interconnect is ready for disaggregated serving over RDMA/NVLink. Use after recipe-runner brings a deployment up (especially disagg/multi-node) to confirm the KV transport is correct; use troubleshoot for
nvidia
dynamo-recipe-runner
Select, validate, patch, and deploy existing NVIDIA Dynamo Kubernetes recipes. Use for model/backend/GPU/deployment-mode recipe bring-up; use router-starter for router-only mode work and troubleshoot for broken deployments.
nvidia
dynamo-router-starter
Start or patch Dynamo router modes and run router endpoint smoke checks. Use for round-robin, KV-aware, least-loaded, or device-aware routing setup; use recipe-runner for recipe deployment and troubleshoot for failure diagnosis.
nvidia
dynamo-troubleshoot
Diagnose failed or unhealthy Dynamo deployments. Use when pods, model-cache jobs, PVCs, workers, frontend/router health, endpoints, or benchmark jobs fail; use recipe-runner/router-starter before this for normal bring-up.
nvidia
earth2studio-data-fetch
Fetch weather/climate data via Earth2Studio data sources for specific variables and times. Do NOT use for inference pipelines, model discovery, or installation.
nvidia
earth2studio-deterministic-forecast
Build deterministic forecast scripts with Earth2Studio (model, data source, IO, inference). Do NOT use for ensemble, diagnostics, data-only fetch, or install.
nvidia
earth2studio-discover
Find Earth2Studio models, data sources, and examples for a weather/climate use case. Do NOT use for writing inference code, downloading data, or installation.
nvidia
earth2studio-install
Guide installing Earth2Studio via uv or pip, selecting model extras, and configuring the environment. Do NOT use for writing inference code, choosing models, or PhysicsNeMo questions.
nvidia
holoscan-install-conda
Install Holoscan SDK v4.3+ via Conda in a CUDA 13 environment. Use for Conda installs; redirect CUDA 12 hosts to container/wheel.
nvidia
holoscan-install-container
Install Holoscan SDK via the NGC Docker container. Use for container-based installs; not for native apt/pip/Conda installs.
nvidia
holoscan-install-debian
Install Holoscan SDK natively on Ubuntu via apt. Use for C++ installs on Ubuntu; pair with /holoscan-install-wheel for Python.
nvidia
holoscan-install-source
Build Holoscan SDK from source via the in-tree ./run script. Use only when published packages don't meet the user's needs.
nvidia
holoscan-install-wheel
Install Holoscan SDK Python wheel via pip into a venv. Use for Python installs; not for native C++/apt or Conda installs.
nvidia
holoscan-setup
Guides Holoscan SDK installation: inspects the host, assesses platform compatibility, recommends an install method, and delegates to the matching install skill.
nvidia
hsb-app
Discover and run Holoscan Sensor Bridge example applications on a connected devkit. Filters available apps by the user's platform, HSB software version, board type, and sensors. Supports timed execution, failure analysis, code-edit suggestions, and iterat
nvidia
hsb-flash
Flash the FPGA on an HSB board connected to an NVIDIA devkit. Supports HSB Lattice boards (FPGA versions 2407, 2412, 2507, 2510) and Leopard Imaging VB1940 "all-in-one" cameras (FPGA versions 2507, 2510). Uses release-specific YAML manifests and board-typ
nvidia
hsb-setup
Clone the latest NVIDIA Holoscan Sensor Bridge repo, ask which supported devkit is being used, configure the host per platform, build the correct demo container, run it, and verify HSB connectivity by pinging 192.168.0.2. Use for Holoscan Sensor Bridge se
nvidia
hsb-test
Execute QA test plans on Holoscan Sensor Bridge hardware. Reads a user-provided test document, filters tests by the user's setup, determines which tests can run automatically, executes them with pass/fail evaluation, and produces a structured test results
nvidia
launch-nemo-rl
Playbook for launching, monitoring, stopping, and debugging NeMo-RL recipes on a Kubernetes cluster via the nrl-k8s CLI. Covers ephemeral vs long-lived RayCluster modes, iterating on runs, and debugging hung or failed training jobs.
nvidia
mcore-create-issue
Investigate a failing GitHub Actions run or job and create a GitHub issue for the failure.
nvidia
mcore-linting-and-formatting
Linting and formatting for Megatron-LM. Covers running autoformat.sh, tools (ruff, black, isort, pylint, mypy), and code style rules.
nvidia
mcore-run-on-slurm
How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions,
nvidia
mcore-split-pr
Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.
nvidia
mcore-testing
Test system for Megatron-LM. Covers test layout, recipe YAML structure, adding and running unit and functional tests, golden values, marker filters, and CI parity.
nvidia
nemo-automodel-distributed-training
Guide for selecting and configuring distributed training strategies in NeMo AutoModel, including FSDP2, Megatron FSDP, DDP, and parallelism settings.
nvidia
nemo-automodel-launcher-config
Configure NeMo AutoModel job launches for interactive runs, Slurm clusters, and SkyPilot cloud execution.
nvidia
nemo-automodel-model-onboarding
Guide for onboarding new model architectures into NeMo AutoModel, including architecture discovery, implementation patterns, registration, and validation.
nvidia
nemo-automodel-recipe-development
Create and modify NeMo AutoModel training and evaluation recipes, including YAML structure, builders, and execution flow.
nvidia
nemo-data-designer-plugin
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
nvidia
nemo-evaluator-plugin
Use when working on the Evaluator plugin CLI, jobs, SDK-backed specs, metric types, or plugin-owned Evaluator skills.
nvidia
nemo-mbridge-mlm-bridge-training
Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.
nvidia
nemo-mbridge-multi-node-slurm
Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.
nvidia
nemo-mbridge-perf-activation-recompute
Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.
nvidia
nemo-mbridge-perf-cpu-offloading
Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.
nvidia
nemo-mbridge-perf-cuda-graphs
Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.
nvidia
nemo-mbridge-perf-expert-parallel-overlap
Validate and use MoE expert-parallel communication overlap in Megatron-Bridge, including overlap_moe_expert_parallel_comm, delay_wgrad_compute, and flex dispatcher backends such as DeepEP and HybridEP.
nvidia
nemo-mbridge-perf-hierarchical-context-parallel
Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-perf-megatron-fsdp
Operational guide for enabling Megatron FSDP in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-perf-memory-tuning
Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.
nvidia
nemo-mbridge-perf-moe-comm-overlap
MoE expert-parallel communication overlap in Megatron Bridge. Covers dispatch/combine overlap, flex dispatcher backends, and expert wgrad scheduling.
nvidia
nemo-mbridge-perf-moe-dispatcher-selection
Choose the right MoE token dispatcher (`alltoall`, DeepEP, or HybridEP) for the hardware, EP degree, and optimization stage. Summarizes patterns from DSV3, Qwen3, Qwen3-Next, and VLM bring-up work.
nvidia
nemo-mbridge-perf-moe-hardware-configs
Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.
nvidia
nemo-mbridge-perf-moe-long-context
Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.
nvidia
nemo-mbridge-perf-moe-optimization-workflow
Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.
nvidia
nemo-mbridge-perf-moe-vlm-training
Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.
nvidia
nemo-mbridge-perf-parallelism-strategies
Operational guide for choosing and combining parallelism strategies in Megatron Bridge, including sizing rules, hardware topology mapping, and combined parallelism configuration.
nvidia
nemo-mbridge-perf-sequence-packing
Validate and use packed sequences and long-context training in Megatron-Bridge, distinguishing offline packed SFT for LLMs from in-batch packing for VLMs, and applying the right CP constraints.
nvidia
nemo-mbridge-perf-tp-dp-comm-overlap
Operational guide for enabling TP, DP, and PP communication overlap in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-recipe-recommender
Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes.
nvidia
nemo-mbridge-resiliency
Resiliency features in Megatron Bridge including fault tolerance, straggler detection, in-process restart, preemption, and re-run state machine.
nvidia
nemo-retriever
Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or
nvidia
nemo-rl-auto-research
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines
nvidia
nemo-rl-brev-etiquette
Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running nemo-rl-auto-research campaigns, experiments, training jobs
nvidia
nemo-rl-docs
Documentation conventions for NeMo-RL. Covers docs/index.md updates and docstring format. Do NOT use for: bug fixes, test fixes, dependency bumps, refactoring, CI/CD changes, performance tuning, or any task that does not involve writing or updating docume
nvidia
nemo-rl-session-memory
Manage durable working-session memory for coding agents. Use when a user asks to preserve or recover agent context across disconnects, VS Code restarts, long-running work, handoffs, or any session where important state should be written periodically under
nvidia
nemoclaw-user-agent-skills
Describes the agent skills shipped with NemoClaw and how to access them by cloning the repository. Use when users ask about AI agent support, coding assistant integration, or the .agents/skills/ directory. Trigger keywords - nemoclaw agent skills, ai codi
nvidia
nemoclaw-user-configure-inference
Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server,
nvidia
nemoclaw-user-configure-security
Presents a risk framework for every configurable security control in NemoClaw. Use when evaluating security posture, reviewing sandbox security defaults, or assessing control trade-offs. Trigger keywords - nemoclaw security best practices, sandbox securit
nvidia
nemoclaw-user-deploy-remote
Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path and the preferred installer plus onboard flow. Use when deploying NemoClaw to a remote VM, onboarding a Brev instance, or migrating away from the legac
nvidia
nemoclaw-user-get-started
Installs NemoClaw, launches a sandbox, and runs the first agent prompt. Use when onboarding, installing, or launching a NemoClaw sandbox for the first time. Trigger keywords - nemoclaw quickstart, install nemoclaw openclaw sandbox, nemohermes quickstart,
nvidia
nemoclaw-user-manage-policy
Adds, removes, or modifies allowed endpoints in the sandbox policy. Use when customizing network policy, changing egress rules, or configuring sandbox endpoint access. Trigger keywords - customize nemoclaw network policy, sandbox egress policy configurati
nvidia
nemoclaw-user-manage-sandboxes
Explains operational tasks after the quickstart: listing sandboxes, status and health checks, logs, diagnostics, port forwards, multiple sandboxes, credential reset, rebuilds, network presets, upgrades, and uninstall. Trigger keywords - manage nemoclaw sa
nvidia
nemoclaw-user-monitor-sandbox
Inspects sandbox health, traces agent behavior, and diagnoses problems. Use when monitoring a running sandbox, debugging agent issues, or checking sandbox logs. Trigger keywords - monitor nemoclaw sandbox, debug nemoclaw agent issues.
nvidia
nemoclaw-user-overview
Explains how OpenClaw, OpenShell, and NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond the community sandbox, and when to prefer NemoClaw versus integrating OpenShell and OpenClaw directly. Use when users ask about
nvidia
nemoclaw-user-reference
Describes the NemoClaw integration layer and blueprint architecture and how they orchestrate compatible agent sandboxes. Use when looking up architecture, agent integration, plugin structure, or blueprint design. Trigger keywords - nemoclaw architecture,
nvidia
nemotron-customize
Plan, configure, and chain repo-native Nemotron customization steps into single-step or multi-step pipelines: curation, translation, SFT/PEFT (AutoModel or Megatron-Bridge), pretraining/CPT, RL alignment (DPO/RLVR/GRPO/RLHF), BYOB/MCQ benchmarks, checkpoi
nvidia
nemotron-policy-generator
Generates BYO custom safety policies for NVIDIA Nemotron content-safety guardrails — Nemotron-Content-Safety-Reasoning-4B (text) and multimodal Nemotron-3-Content-Safety. Produces a Markdown policy, JSON taxonomy, and drop-in inference prompts. Maps rough
nvidia
nemotron-retrieval-recipes
Use when planning, debugging, tuning, evaluating, exporting, or deploying public Nemotron `embed`/`rerank` retrieval recipes.
nvidia
nemotron-speech
Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.
nvidia
nv-generate-ct-rflow
Used for generating synthetic CT volumes and masks with NV-Generate-CTMR rflow-ct. Not for production training data without review.
nvidia
nv-generate-mr
Used for generating synthetic body MRI volumes with NV-Generate-CTMR rflow-mr. Not for paired masks or production training data.
nvidia
nv-generate-mr-brain
Used for generating synthetic brain MRI volumes with NV-Generate-CTMR rflow-mr-brain. Not for production training data.
nvidia
nv-generate-mr-brain-finetune
Used for finetuning NV-Generate-CTMR MR-brain diffusion UNet from a NIfTI datalist. Not for clinical or production data approval.
nvidia
nv-generate-vae-finetune
Used for finetuning the NV-Generate-CTMR MAISI VAE from CT/MRI NIfTI datalists. Not for clinical or production data approval.
nvidia
nv-reason-cxr
Used for command-shape or live NV-Reason-CXR chest X-ray reasoning smoke tests. Not for diagnosis or clinical reporting.
nvidia
nv-segment-ct
Used for running NV-Segment-CT VISTA3D on CT NIfTI volumes and recording label-map evidence.
nvidia
nv-segment-ct-finetune
Used for smoke or dataset finetuning of NV-Segment-CT VISTA3D on CT NIfTI labels. Not for clinical validation.
nvidia
nv-segment-ctmr
Used for running NV-Segment-CTMR on CT or MRI NIfTI volumes and recording label-map evidence. Not for clinical interpretation.
nvidia
omniverse-cad-to-simready
Coordinate the end-to-end CAD/source-asset to SimReady workflow. Use for broad requests such as CAD to SimReady, source asset to simulation-ready USD, or prop packaging that require conversion, material/physics assignment, SimReady conformance, validation
nvidia
omniverse-realtime-viewer
Use as the top-level router for Omniverse Realtime Viewer USD app requests and focused viewer reference documents.
nvidia
omniverse-usd-performance-tuning
Top-level workflow skill for USD performance diagnosis and optimization. Use for slow loading, high memory, low FPS, or 'optimize my scene' requests; delegates auth/runtime setup to Phase 0 owners.
nvidia
physical-ai-defect-image-generation
Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path per
nvidia
physical-ai-infrastructure-setup-and-resilient-scaling
Use when the user wants to set up, scale, validate, or harden NVIDIA physical AI infrastructure for synthetic data generation workflows across local MicroK8s or Azure AKS, including Kubernetes clusters, inference endpoint deployment, OSMO deployment, work
nvidia
physical-ai-neural-reconstruction
Router for NVIDIA NuRec/NRE: USDZ rendering, NCore conversion, 3DGS, gRPC sensor sim, PhysicalAI HF datasets. Do NOT use for SimReady or infra setup.
nvidia
physical-ai-video-data-augmentation
Use when running video data augmentation and auto-labeling workflows on OSMO: flow selection, preflight, submit-time interpolation, monitoring, and output retrieval. Trigger keywords: video data augmentation, data enrichment, auto labeling, VDA demo, OSMO
nvidia
physicsnemo-discover
Official NVIDIA-authored guidance for navigating PhysicsNeMo — pick the model, datapipe, or example for a SciML/AI4Science task (surrogates, forecasting, downscaling, physics-informed, inverse, generative). Points at existing files via live repo search; n
nvidia
rag-blueprint
NVIDIA RAG Blueprint — deploy, configure, troubleshoot, and manage. Handles any RAG action: deploy, install, start, enable, disable, toggle, change, configure, troubleshoot, debug, fix, shutdown, stop, or tear down any RAG feature or service (Agentic RAG,
nvidia
rag-eval
Filesystem RAG benchmarks: corpus/, train.json, evaluate_rag.py (RAGAS quality). Not for prod monitoring, latency/throughput benchmarking (use rag-perf), or evals outside this repo layout.
nvidia
rag-perf
Performance benchmarking for a deployed NVIDIA RAG Blueprint server: profiling pass + aiperf load test driven by a single YAML config. Not for accuracy / RAGAS scoring (use rag-eval) or for deploying / repairing services (use rag-blueprint).
nvidia
skill-card-generator
Use only to generate or update a governance skill card for a specified existing agent skill directory. Do not use for explaining, listing, comparing, or discussing skill capabilities.
nvidia
tao-analyze-changenet-rca
Performs deep Root Cause Analysis (RCA) on NVIDIA TAO Visual ChangeNet classification experiments with image-evidence-driven investigation. Use when analyzing ChangeNet model failures, investigating poor recall / FAR / PASS-NO_PASS metrics, auditing visua
nvidia
tao-analyze-gaps-visual-changenet
Performs gap analysis on NVIDIA TAO Visual ChangeNet (VCN) Classify experiments by invoking the data-services container (`tao_toolkit.data_services` from `versions.yaml`) directly via `docker run … gap_analysis vcn_aoi …` — picks the optimal decision thre
nvidia
tao-analyze-gaps-vlm-bcq
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary
nvidia
tao-convert-dataset-format
Run `tao-daft convert` to convert NVIDIA TAO DAFT datasets between supported formats. Do not use for non-DAFT data. Use when the user asks to convert a DAFT dataset, change DAFT format, change a TAO dataset format, or run `tao-daft convert`.
nvidia
tao-finetune-clip
CLIP vision-language model for image-text retrieval, zero-shot classification, embedding extraction, ONNX export, and TensorRT deployment. Use when fine-tuning or training CLIP, running zero-shot classification, computing image embeddings, or deploying CL
nvidia
tao-finetune-cosmos-embed
Cosmos-Embed1 video-text embedding for text-to-video retrieval, video-to-video search, semantic deduplication, and fine-tuning. Use when the user asks to "fine-tune Cosmos-Embed1", "run cosmos-embed inference", "export Cosmos-Embed1", "embed videos", or "
nvidia
tao-finetune-cosmos-reason
Cosmos-Reason2-8B video QA supervised fine-tuning with FSDP parallelism. Use when training or evaluating video question-answering models, fine-tuning Cosmos-Reason2 with SFT, or working with Cosmos-RL. Trigger phrases include "fine-tune Cosmos-Reason", "C
nvidia
tao-finetune-huggingface-model
Fine-tune any HuggingFace CV / VLM / LLM model on local NVIDIA GPUs inside an NGC PyTorch container. Use when the user wants to fine-tune a HuggingFace model (full or LoRA), train a vision / VLM / LLM model end-to-end, generate a reproducible HF training
nvidia
tao-generate-image-grounding
Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label
nvidia
tao-generate-referring-expressions
Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region descriptions, scene captions, grounded referring expressions, and (optionally) verified expressions via VLM distillation. Use when the user wants to gen
nvidia
tao-generate-video-reasoning-annotations
Multi-step video annotation pipeline that turns raw videos into Chain-of-Thought training data — multi-level captions, structured descriptions, and QA pairs (MCQ, binary, open-ended) with reasoning traces, via VLM/LLM distillation. Use when the user wants
nvidia
tao-launch-workflow
Shared launch intake for any TAO workflow or action. Use when the user wants to run TAO AutoML, train, evaluate, infer, export, generate TensorRT engines, or launch DEFT/workflow jobs on an execution platform.
nvidia
tao-list-capabilities
Answer what the TAO Skill Bank plugin can do by generating the response from packaged application, data, model, AutoML, and platform manifests.
nvidia
tao-mine-aoi-images
Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate next step after `tao-route-visual-cha
nvidia
tao-port-huggingface-model
Integrate a HuggingFace Computer Vision model into the NVIDIA TAO Toolkit ecosystem (tao-core config, tao-pytorch trainer, tao-deploy TensorRT pipeline). Use when the user asks to "integrate a HuggingFace model into TAO", "add an HF model to TAO Toolkit",
nvidia
tao-route-visual-changenet-samples
Routes the weakest VCN samples (output of `tao-analyze-gaps-visual-changenet`) into per-augmentation-module subsets — one parquet for k-NN mining, one for AnomalyGen (Cosmos SDG) — based on each module's label eligibility. Use as the immediate next step a
nvidia
tao-run-automl
Run AutoML / hyperparameter optimization (HPO) for NVIDIA TAO networks using AutoMLRunner. Handles algorithm selection (bayesian, hyperband, asha, bohb, llm, hybrid, autoresearch), WandB experiment tracking, job execution on any TAO SDK platform, result i
nvidia
tao-run-automl-deft-pipeline
Run the canonical NVIDIA AOI three-phase training pipeline — Phase 1 AutoML baseline (HPO), Phase 2 DEFT loop (RCA → SDG → mining → plain-train retrain), Phase 3 AutoML refinement on the DEFT-augmented dataset. This is the default entry point for any "run
nvidia
tao-run-deft-aoi
Run the full DEFT AOI improvement loop for NVIDIA TAO VisualChangeNet / ChangeNet PCB inspection models: baseline evaluate, RCA, ingestion of customer-supplied pre-generated AnomalyGen images, k-NN mining, retraining, and deployment gating until FAR / rec
nvidia
tao-run-inference-service
Start, query, and stop a network-specific TAO inference microservice ({network_arch}-inference-microservice) by delegating container execution to the appropriate platform skill. Handles container image resolution, job-payload JSON construction, and the se
nvidia
tao-run-on-brev
Brev managed GPU instances with Docker support. Use when running TAO training, evaluation, or inference on Brev GPU instances, managing Brev deployments, or dispatching TAO jobs through the Brev CLI. Trigger phrases include "run on Brev", "Brev GPU instan
nvidia
tao-run-on-kubernetes
Kubernetes execution platform — submits TAO container jobs as single-pod k8s Jobs with NVIDIA GPU scheduling. Use when running on EKS / GKE / AKS / on-prem clusters with the NVIDIA GPU Operator installed, or when integrating TAO into an existing k8s-nativ
nvidia
tao-run-on-lepton
DGX Cloud Lepton managed GPU compute platform with run/status/cancel interface. Use when submitting TAO jobs to DGX Cloud, dispatching training/eval/inference to Lepton GPU resources, or managing Lepton workspace deployments. Trigger phrases include "run
nvidia
tao-run-on-local-docker
Local Docker execution for TAO SDK job containers using the host Docker daemon and NVIDIA GPU runtime. Use when running TAO jobs on the current machine or a directly attached Docker host. Trigger phrases include "run locally", "local Docker", "use my GPU"
nvidia
tao-run-on-slurm
Remote SLURM GPU cluster execution over SSH with sbatch/srun, Pyxis/Enroot containers, and Lustre-backed results. Use when running TAO training/eval/inference jobs on an on-prem or DGX SLURM cluster. Trigger phrases include "run on SLURM", "submit sbatch"
nvidia
tao-run-platform
TAO Execution SDK for submitting and monitoring GPU training jobs on supported platforms (Lepton, Brev, SLURM, local Docker, Kubernetes). Use when the user wants to run TAO jobs through the SDK, get job tracking, S3 I/O wrapping, multi-node distributed tr
nvidia
tao-setup-nvidia-gpu-host
Host setup for TAO GPU backends. Checks and, after user approval, installs NVIDIA driver branch 580, CUDA Toolkit 13.0, and NVIDIA Container Toolkit 1.19.0 for Docker/local-Docker and Kubernetes GPU worker hosts. The `--check-only` path works on any Linux
nvidia
tao-train-action-recognition
Action recognition from video sequences. Supports RGB, optical flow, and joint (multi-stream) input types for classifying temporal actions in video clips. Use when training, evaluating, exporting, or running inference on a TAO action-recognition model. Tr
nvidia
tao-train-bevfusion
BEVFusion for multi-sensor 3D object detection. Fuses LiDAR point clouds and camera images in bird's-eye-view (BEV) space, used in autonomous driving for robust 3D perception. Use when training, evaluating, or running inference for a TAO BEVFusion model.
nvidia
tao-train-centerpose
CenterPose for keypoint / pose estimation. Detects object centers and regresses keypoint locations for 6-DoF object pose estimation. Use when training, evaluating, exporting, or running inference for a TAO CenterPose model. Trigger phrases include "train
nvidia
tao-train-deformable-detr
Deformable DETR for 2D object detection. Uses deformable attention for efficient multi-scale feature processing, lighter than DINO with competitive accuracy. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Deformable-D
nvidia
tao-train-depth-anything-v2
Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images. Use when training, evaluating, exporting, or running inference for a TAO monocular depth model. Trigger ph
nvidia
tao-train-dino
DINO (DETR with Improved DeNoising Anchor Boxes) for 2D object detection. Transformer-based detector with denoising training, multi-scale features, and optional distillation support. Use when training, evaluating, exporting, distilling, quantizing, or run
nvidia
tao-train-fast-foundation-stereo
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of FoundationStereo. Predicts disparity maps from stereo image pairs with ~10× lower latency than full FoundationStereo. Use when training, evaluating
nvidia
tao-train-foundation-stereo
Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D reconstruction. Use when training, evaluating, exporting, or running inference for a TAO FoundationStereo model. Trigger phrases include "train stereo d
nvidia
tao-train-grounding-dino
Grounding DINO for open-set object detection. Combines DINO-style detection with a BERT text encoder for language-guided detection — detects objects described by text prompts without a fixed class vocabulary. Use when training, evaluating, exporting, quan
nvidia
tao-train-image-classification
PyTorch-based TAO image classification. Supports a wide range of backbones (FAN, EfficientNet, ResNet, etc.) with distillation and quantization for deployment. Use when training, evaluating, distilling, quantizing, exporting, or running inference for a TA
nvidia
tao-train-mask-auto-encoder
Masked Auto-Encoder (MAE) for self-supervised pretraining and fine-tuning. Masks random patches and reconstructs them to learn visual representations; supports pretrain and finetune stages. Use when training, evaluating, exporting, or running inference fo
nvidia
tao-train-mask-auto-label
MAL (Mask Auto-Label) for weakly-supervised segmentation. Produces segmentation masks from minimal annotations (point or box annotations) using a ViT-MAE backbone. Use when training, evaluating, or running inference for a TAO MAL model. Trigger phrases in
nvidia
tao-train-mask-grounding-dino
Mask Grounding DINO for grounded instance segmentation. Extends Grounding DINO with a mask-prediction head for open-set segmentation guided by text prompts. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Mask-Groundin
nvidia
tao-train-mask2former
Mask2Former for universal image segmentation (panoptic, instance, and semantic). Transformer-based with masked attention for high-quality segmentation results. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Mask2Forme
nvidia
tao-train-metric-learning-recognition
Metric-learning recognition (ml-recog) for fine-grained visual recognition. Learns embeddings for retrieval-based matching (e.g., retail product recognition) using triplet / contrastive losses. Use when training, evaluating, exporting, or running inferenc
nvidia
tao-train-nvdinov2
NVDINOv2 for self-supervised visual representation learning. Trains vision transformers via self-distillation (teacher-student) without labels and produces general-purpose visual features. Use when training, distilling, exporting, or running inference for
nvidia
tao-train-nvpanoptix3d
NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation (semantic, instance, and panoptic masks) with occupancy completion. Built on a VGGT backbone with a Mask2Former-style head and 3D frustum reconstruc
nvidia
tao-train-ocdnet
OCDNet for scene text detection. Detects arbitrary-oriented text regions in natural images using a differentiable binarization approach. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCDNet model
nvidia
tao-train-ocrnet
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC and attention-based decoders. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCRNet mode
nvidia
tao-train-oneformer
OneFormer for universal image segmentation. Unifies panoptic, instance, and semantic segmentation with a single architecture using task-conditioned queries. Use when training, evaluating, exporting, quantizing, or running inference for a TAO OneFormer mod
nvidia
tao-train-optical-inspection
Optical Inspection for defect detection using Siamese networks. Compares image pairs to detect manufacturing defects, anomalies, or quality issues. Use when training, evaluating, exporting, or running inference for a TAO Optical Inspection model on AOI /
nvidia
tao-train-pointpillars
PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a pillar-based representation, then applies 2D detection — used in autonomous driving and robotics. Use when training, evaluating, exporting, prunin
nvidia
tao-train-pose-classification
Pose classification using ST-GCN (Spatial Temporal Graph Convolutional Network). Classifies skeleton sequences into action categories from pose-keypoint data. Use when training, evaluating, exporting, or running inference for a TAO pose-classification mod
nvidia
tao-train-reid
Person re-identification (ReID). Learns discriminative embeddings to match the same person across different camera views, based on metric learning. Use when training, evaluating, exporting, or running inference for a TAO person re-identification model. Tr
nvidia
tao-train-rtdetr
RT-DETR (Real-Time DEtection TRansformer) for 2D object detection. Designed for real-time inference with competitive accuracy and supports distillation and quantization for deployment optimization. Use when training, evaluating, distilling, quantizing, ex
nvidia
tao-train-segformer
SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature extraction, efficient for real-time segmentation tasks. Use when training, evaluating, exporting, quantizing, or running inference for a TAO SegForme
nvidia
tao-train-single-step
Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset without iterative data augmentation, AutoML, or DEFT loops. Trigger phrases include "single train run", "train then evaluate then export", "plain
nvidia
tao-train-sparse4d
Sparse4D for multi-camera temporal 3D object detection and tracking. Uses sparse queries with deformable attention across camera views and time for end-to-end 3D perception, with an instance bank for temporal tracking. Use when training, evaluating, expor
nvidia
tao-train-visual-changenet
Visual ChangeNet for binary image classification and segmentation in AOI defect detection. Use when training, evaluating, exporting, or running inference for PCB defect detection or visual inspection, comparing image pairs for PASS/NO_PASS classification,
nvidia
tao-validate-dataset-format
Run `tao-daft validate` to check NVIDIA TAO DAFT datasets for structure, schema, and cross-reference errors. Do not use for non-DAFT formats. Use when the user asks to validate a DAFT dataset, check DAFT schema, validate a TAO dataset format, or run `tao-
nvidia
tilegym-adding-cutile-kernel
Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or implementing a new cuTile operator/
nvidia
tilegym-converting-cutile-to-julia
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major), type system mapping, and launch API
nvidia
tilegym-converting-cutile-to-triton
Converts cuTile GPU kernels (@ct.kernel) to Triton (@triton.jit). Handles standard in-repo conversion, debugging (cudaErrorIllegalAddress, shape mismatch, numerical mismatch), and mapping cuTile idioms (ct.load/ct.store, ct.Constant, ct.launch) to Triton
nvidia
tilegym-cutile-autotuning
Use when adding, modifying, optimizing, or debugging CuTile autotuning code. Trigger signals: `exhaustive_search` / `replace_hints` / `hints_fn` / `cuda.tile.tune` in code, `autotune` in filenames, or correctness/performance issues in autotuned CuTile ker
nvidia
tilegym-cutile-python
Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.
nvidia
tilegym-improve-cutile-kernel-perf
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and I
nvidia
tilegym-monkey-patch-kernels-to-transformers
Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to instantiating models. Used when the
nvidia
vss-ask-video
Use this skill to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.
nvidia
vss-deploy-dense-captioning
Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion.
nvidia
vss-deploy-detection-tracking-2d
Use this skill when the user wants to deploy, run, debug, tear down, or call the REST API of the RTVI-CV 2D detection / tracking microservice. Trigger when the user says things like 'deploy rtvi-cv', 'start warehouse 2d', 'add a stream', 'check rtvi-cv he
nvidia
vss-deploy-detection-tracking-3d
Deploy and operate the RTVI-CV-3D microservice as MV3DT (`MODE=mv3dt`): per-camera DeepStream perception plus BEV Fusion over calibrated cameras. Supports the bundled sample dataset, custom video files, and RTSP streams, and chains to `vss-generate-video-
nvidia
vss-deploy-profile
Use to select, configure, deploy, verify, debug, or tear down a VSS profile (base, search, lvs, warehouse, edge). Not for standalone microservices — use the vss-deploy-* skill.
nvidia
vss-deploy-video-embedding
Use this skill when deploying, operating, or integrating the VSS 3.2 GA RT-Embed Video Embedding microservice. Covers Docker Compose bring-up, GPU and storage prerequisites, the `/v1` REST API (file uploads, text and video embeddings, live RTSP streams, h
nvidia
vss-generate-video-calibration
Use to run AutoMagicCalib on local MP4s, RTSP, or the bundled sample dataset, and to deploy vss-auto-calibration when needed. Do not use for non-AMC calibration or runtime analytics.
nvidia
vss-generate-video-report
Use this skill when producing a VSS analysis report — Mode A per-clip VLM, Mode B incident-range via video-analytics. Not for standalone video summarization, real-time alerts, or ad-hoc Q&A.
nvidia
vss-manage-alerts
Use for VSS alert workflows — real-time monitoring, Alert-Bridge subscriptions, Slack notifications, incident queries, camera onboarding. Not for non-alert analytics.
nvidia
vss-manage-video-io-storage
Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.
nvidia
vss-query-analytics
Use this skill when reading video-analytics metrics, incidents, alerts, and sensor data via the VA-MCP server (port 9901). Not for live VLM or incident-range narrative reports.
nvidia
vss-search-archive
Use to run top-level VSS fusion search on archived video, or to ingest video files / RTSP streams for search. Do NOT use for ad-hoc visual Q&A (use vss-ask-video), live captioning (use vss-deploy-dense-captioning), or video summarization and reports (use
nvidia
vss-setup-behavior-analytics
Use to deploy the vss-behavior-analytics service standalone (entrypoint, config-source, optional calibration). Not for the full warehouse deploy.
nvidia
vss-setup-video-analytics-api
Use to deploy the vss-video-analytics-api REST service standalone (config-source, data-log bind, Elasticsearch, optional Kafka). Not for full warehouse deploy.
nvidia
vss-summarize-video
Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.
nvidia
accelerated-computing-cudf
Official NVIDIA-authored guidance for NVIDIA cuDF GPU DataFrames, pandas acceleration, dask-cuDF, ETL, joins, groupby, CSV/Parquet I/O, nullable semantics, and multi-GPU DataFrame workloads.
nvidia
aiq-deploy
Use when asked to install, deploy, run, validate, troubleshoot, or stop NVIDIA AI-Q Blueprint infrastructure.
nvidia
aiq-research
Use when asked to run deep research or AI-Q research through a reachable NVIDIA AI-Q Blueprint backend.
nvidia
cudaq-guide
CUDA-Q onboarding guide for installation, test programs, GPU simulation, QPU hardware, and quantum applications.
nvidia
cufolio
Use when a user asks to build, optimize, backtest, rebalance, or analyze a stock portfolio with Mean-CVaR, efficient frontiers, scenario generation, or NVIDIA cuOpt.
nvidia
cuopt-developer
Modify, build, test, debug, and contribute to NVIDIA cuOpt (C++/CUDA, Python, server, CI). Use for solver internals, PRs, DCO, and code conventions.
nvidia
cuopt-install
Install cuOpt for Python, C, or server via pip, conda, or Docker; verify the install. For building cuOpt from source, see cuopt-developer.
nvidia
cuopt-numerical-optimization-api-c
LP, MILP, and QP (beta) with cuOpt — C API only. Use when the user is embedding LP, MILP, or QP in C/C++.
nvidia
cuopt-numerical-optimization-api-cli
LP, MILP, and QP (beta) with cuOpt — CLI only (MPS files, cuopt_cli). Use when the user is solving LP, MILP, or QP from MPS via command line.
nvidia
cuopt-numerical-optimization-api-python
Solve LP, MILP, QP (beta) with cuOpt Python API — linear/quadratic objectives, integer variables, scheduling, portfolio, least squares.
nvidia
cuopt-numerical-optimization-formulation
LP, MILP, QP — concepts, problem-text parsing, and formulation patterns (parameters, constraints, decisions, objective). Concepts only; no API.
nvidia
cuopt-routing-api-python
Vehicle routing (VRP, TSP, PDP) with cuOpt — Python API only. Use when the user is building or solving routing in Python.
nvidia
cuopt-routing-formulation
Vehicle routing (VRP, TSP, PDP) — problem types and data requirements. Domain concepts; no API or interface.
nvidia
cuopt-server-api-python
cuOpt REST server — start server, endpoints, Python/curl client examples. Use when the user is deploying or calling the REST API.
nvidia
cuopt-server-common
cuOpt REST server — what it does and how requests flow. Domain concepts; no deploy or client code.
nvidia
cuopt-skill-evolution
After solving a non-trivial problem, detect generalizable learnings and propose skill updates. Always active — applies to every interaction.
nvidia
cuopt-user-rules
Base rules for end users calling NVIDIA cuOpt (routing/LP/MILP/QP/install/server). Not for cuOpt internals — use cuopt-developer for those.
nvidia
cupynumeric-hdf5
Read and write large cuPyNumeric arrays to HDF5 with Legate's parallel, distributed HDF5 I/O (legate.io.hdf5: to_file, from_file, from_file_batched). Use when a developer needs to save a cuPyNumeric array to an .h5/.hdf5 file, load an HDF5 dataset into a
nvidia
cupynumeric-install
Install and verify cuPyNumeric for Python — requirements, commands, verification. Source builds are out of scope.
nvidia
cupynumeric-migration-readiness
Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must b
nvidia
cupynumeric-parallel-data-load
Load a sharded, on-disk dataset (sharded .npy, Parquet/Arrow, raw binary, sharded HDF5, custom layouts) into a distributed cuPyNumeric ndarray via a manual partition + leaf @task launch with CPU/OMP/GPU variants. Use when no single-call loader fits, inclu
nvidia
dali-dynamic-mode
DALI imperative dynamic mode (`nvidia.dali.experimental.dynamic`, ndd): use when working on ndd code or migrating pipelines; skip pipeline-only tasks.
nvidia
data-designer
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
nvidia
deepstream-dev
NVIDIA DeepStream SDK 9.0 development with Python pyservicemaker API. Use when building video analytics pipelines, GStreamer-based video processing, TensorRT inference integration, object detection/tracking, or Kafka/message broker integration.
nvidia
deepstream-import-vision-model
Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report.
nvidia
dicom-metadata-extract
Used for extracting selected metadata from one DICOM file and flagging standard-tag PHI presence. Not for anonymization or clinical use.
nvidia
dicom-series-preflight
Used for header-only preflight of one DICOM series folder before conversion or inference. Not for de-identification or clinical clearance.
nvidia
dicom-series-to-volume
Used for converting one CT DICOM series folder to a HU NIfTI volume with affine evidence. Not for multi-frame DICOM or clinical use.
nvidia
digital-health-clinical-asr-build
Stage 2 of the Clinical ASR Flywheel. Use when curating clinical terms, tagging IPA, and synthesizing a NeMo manifest. NOT for scoring (use /digital-health-clinical-asr-eval).
nvidia
digital-health-clinical-asr-eval
Stage 3 of Clinical ASR Flywheel. Score a NeMo manifest, produce the five-section KER leaderboard (by-ipa_source diagnostic). Not for ASR auth (/riva-asr).
nvidia
digital-health-clinical-asr-finetune
Stage 4 of the Clinical ASR Flywheel. Use when priority KER is above 0.3 to run stock NeMo SFT on Parakeet TDT v2 and offline cycle N+1 re-eval. NOT for generic word boosting (use /finetune-asr).
nvidia
digital-health-clinical-asr-setup
Stage 1 of Clinical ASR Flywheel. Use when bootstrapping a cycle: NVCF+MW disclosure, NVIDIA_API_KEY check, deps install, TTS+ASR smoke test.
nvidia
dynamo-interconnect-check
Validate that a Dynamo deployment's NIXL/UCX/NCCL interconnect is ready for disaggregated serving over RDMA/NVLink. Use after recipe-runner brings a deployment up (especially disagg/multi-node) to confirm the KV transport is correct; use troubleshoot for
nvidia
dynamo-recipe-runner
Select, validate, patch, and deploy existing NVIDIA Dynamo Kubernetes recipes. Use for model/backend/GPU/deployment-mode recipe bring-up; use router-starter for router-only mode work and troubleshoot for broken deployments.
nvidia
dynamo-router-starter
Start or patch Dynamo router modes and run router endpoint smoke checks. Use for round-robin, KV-aware, least-loaded, or device-aware routing setup; use recipe-runner for recipe deployment and troubleshoot for failure diagnosis.
nvidia
dynamo-troubleshoot
Diagnose failed or unhealthy Dynamo deployments. Use when pods, model-cache jobs, PVCs, workers, frontend/router health, endpoints, or benchmark jobs fail; use recipe-runner/router-starter before this for normal bring-up.
nvidia
earth2studio-data-fetch
Fetch weather/climate data via Earth2Studio data sources for specific variables and times. Do NOT use for inference pipelines, model discovery, or installation.
nvidia
earth2studio-deterministic-forecast
Build deterministic forecast scripts with Earth2Studio (model, data source, IO, inference). Do NOT use for ensemble, diagnostics, data-only fetch, or install.
nvidia
earth2studio-discover
Find Earth2Studio models, data sources, and examples for a weather/climate use case. Do NOT use for writing inference code, downloading data, or installation.
nvidia
earth2studio-install
Guide installing Earth2Studio via uv or pip, selecting model extras, and configuring the environment. Do NOT use for writing inference code, choosing models, or PhysicsNeMo questions.
nvidia
holoscan-install-conda
Install Holoscan SDK v4.3+ via Conda in a CUDA 13 environment. Use for Conda installs; redirect CUDA 12 hosts to container/wheel.
nvidia
holoscan-install-container
Install Holoscan SDK via the NGC Docker container. Use for container-based installs; not for native apt/pip/Conda installs.
nvidia
holoscan-install-debian
Install Holoscan SDK natively on Ubuntu via apt. Use for C++ installs on Ubuntu; pair with /holoscan-install-wheel for Python.
nvidia
holoscan-install-source
Build Holoscan SDK from source via the in-tree ./run script. Use only when published packages don't meet the user's needs.
nvidia
holoscan-install-wheel
Install Holoscan SDK Python wheel via pip into a venv. Use for Python installs; not for native C++/apt or Conda installs.
nvidia
holoscan-setup
Guides Holoscan SDK installation: inspects the host, assesses platform compatibility, recommends an install method, and delegates to the matching install skill.
nvidia
hsb-app
Discover and run Holoscan Sensor Bridge example applications on a connected devkit. Filters available apps by the user's platform, HSB software version, board type, and sensors. Supports timed execution, failure analysis, code-edit suggestions, and iterat
nvidia
hsb-flash
Flash the FPGA on an HSB board connected to an NVIDIA devkit. Supports HSB Lattice boards (FPGA versions 2407, 2412, 2507, 2510) and Leopard Imaging VB1940 "all-in-one" cameras (FPGA versions 2507, 2510). Uses release-specific YAML manifests and board-typ
nvidia
hsb-setup
Clone the latest NVIDIA Holoscan Sensor Bridge repo, ask which supported devkit is being used, configure the host per platform, build the correct demo container, run it, and verify HSB connectivity by pinging 192.168.0.2. Use for Holoscan Sensor Bridge se
nvidia
hsb-test
Execute QA test plans on Holoscan Sensor Bridge hardware. Reads a user-provided test document, filters tests by the user's setup, determines which tests can run automatically, executes them with pass/fail evaluation, and produces a structured test results
nvidia
launch-nemo-rl
Playbook for launching, monitoring, stopping, and debugging NeMo-RL recipes on a Kubernetes cluster via the nrl-k8s CLI. Covers ephemeral vs long-lived RayCluster modes, iterating on runs, and debugging hung or failed training jobs.
nvidia
mcore-create-issue
Investigate a failing GitHub Actions run or job and create a GitHub issue for the failure.
nvidia
mcore-linting-and-formatting
Linting and formatting for Megatron-LM. Covers running autoformat.sh, tools (ruff, black, isort, pylint, mypy), and code style rules.
nvidia
mcore-run-on-slurm
How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions,
nvidia
mcore-split-pr
Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.
nvidia
mcore-testing
Test system for Megatron-LM. Covers test layout, recipe YAML structure, adding and running unit and functional tests, golden values, marker filters, and CI parity.
nvidia
nemo-automodel-distributed-training
Guide for selecting and configuring distributed training strategies in NeMo AutoModel, including FSDP2, Megatron FSDP, DDP, and parallelism settings.
nvidia
nemo-automodel-launcher-config
Configure NeMo AutoModel job launches for interactive runs, Slurm clusters, and SkyPilot cloud execution.
nvidia
nemo-automodel-model-onboarding
Guide for onboarding new model architectures into NeMo AutoModel, including architecture discovery, implementation patterns, registration, and validation.
nvidia
nemo-automodel-recipe-development
Create and modify NeMo AutoModel training and evaluation recipes, including YAML structure, builders, and execution flow.
nvidia
nemo-data-designer-plugin
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
nvidia
nemo-evaluator-plugin
Use when working on the Evaluator plugin CLI, jobs, SDK-backed specs, metric types, or plugin-owned Evaluator skills.
nvidia
nemo-mbridge-mlm-bridge-training
Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.
nvidia
nemo-mbridge-multi-node-slurm
Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.
nvidia
nemo-mbridge-perf-activation-recompute
Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.
nvidia
nemo-mbridge-perf-cpu-offloading
Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.
nvidia
nemo-mbridge-perf-cuda-graphs
Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.
nvidia
nemo-mbridge-perf-expert-parallel-overlap
Validate and use MoE expert-parallel communication overlap in Megatron-Bridge, including overlap_moe_expert_parallel_comm, delay_wgrad_compute, and flex dispatcher backends such as DeepEP and HybridEP.
nvidia
nemo-mbridge-perf-hierarchical-context-parallel
Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-perf-megatron-fsdp
Operational guide for enabling Megatron FSDP in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-perf-memory-tuning
Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.
nvidia
nemo-mbridge-perf-moe-comm-overlap
MoE expert-parallel communication overlap in Megatron Bridge. Covers dispatch/combine overlap, flex dispatcher backends, and expert wgrad scheduling.
nvidia
nemo-mbridge-perf-moe-dispatcher-selection
Choose the right MoE token dispatcher (`alltoall`, DeepEP, or HybridEP) for the hardware, EP degree, and optimization stage. Summarizes patterns from DSV3, Qwen3, Qwen3-Next, and VLM bring-up work.
nvidia
nemo-mbridge-perf-moe-hardware-configs
Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.
nvidia
nemo-mbridge-perf-moe-long-context
Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.
nvidia
nemo-mbridge-perf-moe-optimization-workflow
Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.
nvidia
nemo-mbridge-perf-moe-vlm-training
Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.
nvidia
nemo-mbridge-perf-parallelism-strategies
Operational guide for choosing and combining parallelism strategies in Megatron Bridge, including sizing rules, hardware topology mapping, and combined parallelism configuration.
nvidia
nemo-mbridge-perf-sequence-packing
Validate and use packed sequences and long-context training in Megatron-Bridge, distinguishing offline packed SFT for LLMs from in-batch packing for VLMs, and applying the right CP constraints.
nvidia
nemo-mbridge-perf-tp-dp-comm-overlap
Operational guide for enabling TP, DP, and PP communication overlap in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nvidia
nemo-mbridge-recipe-recommender
Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes.
nvidia
nemo-mbridge-resiliency
Resiliency features in Megatron Bridge including fault tolerance, straggler detection, in-process restart, preemption, and re-run state machine.
nvidia
nemo-retriever
Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or
nvidia
nemo-rl-auto-research
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines
nvidia
nemo-rl-brev-etiquette
Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running nemo-rl-auto-research campaigns, experiments, training jobs
nvidia
nemo-rl-docs
Documentation conventions for NeMo-RL. Covers docs/index.md updates and docstring format. Do NOT use for: bug fixes, test fixes, dependency bumps, refactoring, CI/CD changes, performance tuning, or any task that does not involve writing or updating docume
nvidia
nemo-rl-session-memory
Manage durable working-session memory for coding agents. Use when a user asks to preserve or recover agent context across disconnects, VS Code restarts, long-running work, handoffs, or any session where important state should be written periodically under
nvidia
nemoclaw-user-agent-skills
Describes the agent skills shipped with NemoClaw and how to access them by cloning the repository. Use when users ask about AI agent support, coding assistant integration, or the .agents/skills/ directory. Trigger keywords - nemoclaw agent skills, ai codi
nvidia
nemoclaw-user-configure-inference
Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server,
nvidia
nemoclaw-user-configure-security
Presents a risk framework for every configurable security control in NemoClaw. Use when evaluating security posture, reviewing sandbox security defaults, or assessing control trade-offs. Trigger keywords - nemoclaw security best practices, sandbox securit
nvidia
nemoclaw-user-deploy-remote
Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path and the preferred installer plus onboard flow. Use when deploying NemoClaw to a remote VM, onboarding a Brev instance, or migrating away from the legac
nvidia
nemoclaw-user-get-started
Installs NemoClaw, launches a sandbox, and runs the first agent prompt. Use when onboarding, installing, or launching a NemoClaw sandbox for the first time. Trigger keywords - nemoclaw quickstart, install nemoclaw openclaw sandbox, nemohermes quickstart,
nvidia
nemoclaw-user-manage-policy
Adds, removes, or modifies allowed endpoints in the sandbox policy. Use when customizing network policy, changing egress rules, or configuring sandbox endpoint access. Trigger keywords - customize nemoclaw network policy, sandbox egress policy configurati
nvidia
nemoclaw-user-manage-sandboxes
Explains operational tasks after the quickstart: listing sandboxes, status and health checks, logs, diagnostics, port forwards, multiple sandboxes, credential reset, rebuilds, network presets, upgrades, and uninstall. Trigger keywords - manage nemoclaw sa
nvidia
nemoclaw-user-monitor-sandbox
Inspects sandbox health, traces agent behavior, and diagnoses problems. Use when monitoring a running sandbox, debugging agent issues, or checking sandbox logs. Trigger keywords - monitor nemoclaw sandbox, debug nemoclaw agent issues.
nvidia
nemoclaw-user-overview
Explains how OpenClaw, OpenShell, and NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond the community sandbox, and when to prefer NemoClaw versus integrating OpenShell and OpenClaw directly. Use when users ask about
nvidia
nemoclaw-user-reference
Describes the NemoClaw integration layer and blueprint architecture and how they orchestrate compatible agent sandboxes. Use when looking up architecture, agent integration, plugin structure, or blueprint design. Trigger keywords - nemoclaw architecture,
nvidia
nemotron-customize
Plan, configure, and chain repo-native Nemotron customization steps into single-step or multi-step pipelines: curation, translation, SFT/PEFT (AutoModel or Megatron-Bridge), pretraining/CPT, RL alignment (DPO/RLVR/GRPO/RLHF), BYOB/MCQ benchmarks, checkpoi
nvidia
nemotron-policy-generator
Generates BYO custom safety policies for NVIDIA Nemotron content-safety guardrails — Nemotron-Content-Safety-Reasoning-4B (text) and multimodal Nemotron-3-Content-Safety. Produces a Markdown policy, JSON taxonomy, and drop-in inference prompts. Maps rough
nvidia
nemotron-retrieval-recipes
Use when planning, debugging, tuning, evaluating, exporting, or deploying public Nemotron `embed`/`rerank` retrieval recipes.
nvidia
nemotron-speech
Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.
nvidia
nv-generate-ct-rflow
Used for generating synthetic CT volumes and masks with NV-Generate-CTMR rflow-ct. Not for production training data without review.
nvidia
nv-generate-mr
Used for generating synthetic body MRI volumes with NV-Generate-CTMR rflow-mr. Not for paired masks or production training data.
nvidia
nv-generate-mr-brain
Used for generating synthetic brain MRI volumes with NV-Generate-CTMR rflow-mr-brain. Not for production training data.
nvidia
nv-generate-mr-brain-finetune
Used for finetuning NV-Generate-CTMR MR-brain diffusion UNet from a NIfTI datalist. Not for clinical or production data approval.
nvidia
nv-generate-vae-finetune
Used for finetuning the NV-Generate-CTMR MAISI VAE from CT/MRI NIfTI datalists. Not for clinical or production data approval.
nvidia
nv-reason-cxr
Used for command-shape or live NV-Reason-CXR chest X-ray reasoning smoke tests. Not for diagnosis or clinical reporting.
nvidia
nv-segment-ct
Used for running NV-Segment-CT VISTA3D on CT NIfTI volumes and recording label-map evidence.
nvidia
nv-segment-ct-finetune
Used for smoke or dataset finetuning of NV-Segment-CT VISTA3D on CT NIfTI labels. Not for clinical validation.
nvidia
nv-segment-ctmr
Used for running NV-Segment-CTMR on CT or MRI NIfTI volumes and recording label-map evidence. Not for clinical interpretation.
nvidia
omniverse-cad-to-simready
Coordinate the end-to-end CAD/source-asset to SimReady workflow. Use for broad requests such as CAD to SimReady, source asset to simulation-ready USD, or prop packaging that require conversion, material/physics assignment, SimReady conformance, validation
nvidia
omniverse-realtime-viewer
Use as the top-level router for Omniverse Realtime Viewer USD app requests and focused viewer reference documents.
nvidia
omniverse-usd-performance-tuning
Top-level workflow skill for USD performance diagnosis and optimization. Use for slow loading, high memory, low FPS, or 'optimize my scene' requests; delegates auth/runtime setup to Phase 0 owners.
nvidia
physical-ai-defect-image-generation
Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path per
nvidia
physical-ai-infrastructure-setup-and-resilient-scaling
Use when the user wants to set up, scale, validate, or harden NVIDIA physical AI infrastructure for synthetic data generation workflows across local MicroK8s or Azure AKS, including Kubernetes clusters, inference endpoint deployment, OSMO deployment, work
nvidia
physical-ai-neural-reconstruction
Router for NVIDIA NuRec/NRE: USDZ rendering, NCore conversion, 3DGS, gRPC sensor sim, PhysicalAI HF datasets. Do NOT use for SimReady or infra setup.
nvidia
physical-ai-video-data-augmentation
Use when running video data augmentation and auto-labeling workflows on OSMO: flow selection, preflight, submit-time interpolation, monitoring, and output retrieval. Trigger keywords: video data augmentation, data enrichment, auto labeling, VDA demo, OSMO
nvidia
physicsnemo-discover
Official NVIDIA-authored guidance for navigating PhysicsNeMo — pick the model, datapipe, or example for a SciML/AI4Science task (surrogates, forecasting, downscaling, physics-informed, inverse, generative). Points at existing files via live repo search; n
nvidia
rag-blueprint
NVIDIA RAG Blueprint — deploy, configure, troubleshoot, and manage. Handles any RAG action: deploy, install, start, enable, disable, toggle, change, configure, troubleshoot, debug, fix, shutdown, stop, or tear down any RAG feature or service (Agentic RAG,
nvidia
rag-eval
Filesystem RAG benchmarks: corpus/, train.json, evaluate_rag.py (RAGAS quality). Not for prod monitoring, latency/throughput benchmarking (use rag-perf), or evals outside this repo layout.
nvidia
rag-perf
Performance benchmarking for a deployed NVIDIA RAG Blueprint server: profiling pass + aiperf load test driven by a single YAML config. Not for accuracy / RAGAS scoring (use rag-eval) or for deploying / repairing services (use rag-blueprint).
nvidia
skill-card-generator
Use only to generate or update a governance skill card for a specified existing agent skill directory. Do not use for explaining, listing, comparing, or discussing skill capabilities.
nvidia
tao-analyze-changenet-rca
Performs deep Root Cause Analysis (RCA) on NVIDIA TAO Visual ChangeNet classification experiments with image-evidence-driven investigation. Use when analyzing ChangeNet model failures, investigating poor recall / FAR / PASS-NO_PASS metrics, auditing visua
nvidia
tao-analyze-gaps-visual-changenet
Performs gap analysis on NVIDIA TAO Visual ChangeNet (VCN) Classify experiments by invoking the data-services container (`tao_toolkit.data_services` from `versions.yaml`) directly via `docker run … gap_analysis vcn_aoi …` — picks the optimal decision thre
nvidia
tao-analyze-gaps-vlm-bcq
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary
nvidia
tao-convert-dataset-format
Run `tao-daft convert` to convert NVIDIA TAO DAFT datasets between supported formats. Do not use for non-DAFT data. Use when the user asks to convert a DAFT dataset, change DAFT format, change a TAO dataset format, or run `tao-daft convert`.
nvidia
tao-finetune-clip
CLIP vision-language model for image-text retrieval, zero-shot classification, embedding extraction, ONNX export, and TensorRT deployment. Use when fine-tuning or training CLIP, running zero-shot classification, computing image embeddings, or deploying CL
nvidia
tao-finetune-cosmos-embed
Cosmos-Embed1 video-text embedding for text-to-video retrieval, video-to-video search, semantic deduplication, and fine-tuning. Use when the user asks to "fine-tune Cosmos-Embed1", "run cosmos-embed inference", "export Cosmos-Embed1", "embed videos", or "
nvidia
tao-finetune-cosmos-reason
Cosmos-Reason2-8B video QA supervised fine-tuning with FSDP parallelism. Use when training or evaluating video question-answering models, fine-tuning Cosmos-Reason2 with SFT, or working with Cosmos-RL. Trigger phrases include "fine-tune Cosmos-Reason", "C
nvidia
tao-finetune-huggingface-model
Fine-tune any HuggingFace CV / VLM / LLM model on local NVIDIA GPUs inside an NGC PyTorch container. Use when the user wants to fine-tune a HuggingFace model (full or LoRA), train a vision / VLM / LLM model end-to-end, generate a reproducible HF training
nvidia
tao-generate-image-grounding
Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label
nvidia
tao-generate-referring-expressions
Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region descriptions, scene captions, grounded referring expressions, and (optionally) verified expressions via VLM distillation. Use when the user wants to gen
nvidia
tao-generate-video-reasoning-annotations
Multi-step video annotation pipeline that turns raw videos into Chain-of-Thought training data — multi-level captions, structured descriptions, and QA pairs (MCQ, binary, open-ended) with reasoning traces, via VLM/LLM distillation. Use when the user wants
nvidia
tao-launch-workflow
Shared launch intake for any TAO workflow or action. Use when the user wants to run TAO AutoML, train, evaluate, infer, export, generate TensorRT engines, or launch DEFT/workflow jobs on an execution platform.
nvidia
tao-list-capabilities
Answer what the TAO Skill Bank plugin can do by generating the response from packaged application, data, model, AutoML, and platform manifests.
nvidia
tao-mine-aoi-images
Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate next step after `tao-route-visual-cha
nvidia
tao-port-huggingface-model
Integrate a HuggingFace Computer Vision model into the NVIDIA TAO Toolkit ecosystem (tao-core config, tao-pytorch trainer, tao-deploy TensorRT pipeline). Use when the user asks to "integrate a HuggingFace model into TAO", "add an HF model to TAO Toolkit",
nvidia
tao-route-visual-changenet-samples
Routes the weakest VCN samples (output of `tao-analyze-gaps-visual-changenet`) into per-augmentation-module subsets — one parquet for k-NN mining, one for AnomalyGen (Cosmos SDG) — based on each module's label eligibility. Use as the immediate next step a
nvidia
tao-run-automl
Run AutoML / hyperparameter optimization (HPO) for NVIDIA TAO networks using AutoMLRunner. Handles algorithm selection (bayesian, hyperband, asha, bohb, llm, hybrid, autoresearch), WandB experiment tracking, job execution on any TAO SDK platform, result i
nvidia
tao-run-automl-deft-pipeline
Run the canonical NVIDIA AOI three-phase training pipeline — Phase 1 AutoML baseline (HPO), Phase 2 DEFT loop (RCA → SDG → mining → plain-train retrain), Phase 3 AutoML refinement on the DEFT-augmented dataset. This is the default entry point for any "run
nvidia
tao-run-deft-aoi
Run the full DEFT AOI improvement loop for NVIDIA TAO VisualChangeNet / ChangeNet PCB inspection models: baseline evaluate, RCA, ingestion of customer-supplied pre-generated AnomalyGen images, k-NN mining, retraining, and deployment gating until FAR / rec
nvidia
tao-run-inference-service
Start, query, and stop a network-specific TAO inference microservice ({network_arch}-inference-microservice) by delegating container execution to the appropriate platform skill. Handles container image resolution, job-payload JSON construction, and the se
nvidia
tao-run-on-brev
Brev managed GPU instances with Docker support. Use when running TAO training, evaluation, or inference on Brev GPU instances, managing Brev deployments, or dispatching TAO jobs through the Brev CLI. Trigger phrases include "run on Brev", "Brev GPU instan
nvidia
tao-run-on-kubernetes
Kubernetes execution platform — submits TAO container jobs as single-pod k8s Jobs with NVIDIA GPU scheduling. Use when running on EKS / GKE / AKS / on-prem clusters with the NVIDIA GPU Operator installed, or when integrating TAO into an existing k8s-nativ
nvidia
tao-run-on-lepton
DGX Cloud Lepton managed GPU compute platform with run/status/cancel interface. Use when submitting TAO jobs to DGX Cloud, dispatching training/eval/inference to Lepton GPU resources, or managing Lepton workspace deployments. Trigger phrases include "run
nvidia
tao-run-on-local-docker
Local Docker execution for TAO SDK job containers using the host Docker daemon and NVIDIA GPU runtime. Use when running TAO jobs on the current machine or a directly attached Docker host. Trigger phrases include "run locally", "local Docker", "use my GPU"
nvidia
tao-run-on-slurm
Remote SLURM GPU cluster execution over SSH with sbatch/srun, Pyxis/Enroot containers, and Lustre-backed results. Use when running TAO training/eval/inference jobs on an on-prem or DGX SLURM cluster. Trigger phrases include "run on SLURM", "submit sbatch"
nvidia
tao-run-platform
TAO Execution SDK for submitting and monitoring GPU training jobs on supported platforms (Lepton, Brev, SLURM, local Docker, Kubernetes). Use when the user wants to run TAO jobs through the SDK, get job tracking, S3 I/O wrapping, multi-node distributed tr
nvidia
tao-setup-nvidia-gpu-host
Host setup for TAO GPU backends. Checks and, after user approval, installs NVIDIA driver branch 580, CUDA Toolkit 13.0, and NVIDIA Container Toolkit 1.19.0 for Docker/local-Docker and Kubernetes GPU worker hosts. The `--check-only` path works on any Linux
nvidia
tao-train-action-recognition
Action recognition from video sequences. Supports RGB, optical flow, and joint (multi-stream) input types for classifying temporal actions in video clips. Use when training, evaluating, exporting, or running inference on a TAO action-recognition model. Tr
nvidia
tao-train-bevfusion
BEVFusion for multi-sensor 3D object detection. Fuses LiDAR point clouds and camera images in bird's-eye-view (BEV) space, used in autonomous driving for robust 3D perception. Use when training, evaluating, or running inference for a TAO BEVFusion model.
nvidia
tao-train-centerpose
CenterPose for keypoint / pose estimation. Detects object centers and regresses keypoint locations for 6-DoF object pose estimation. Use when training, evaluating, exporting, or running inference for a TAO CenterPose model. Trigger phrases include "train
nvidia
tao-train-deformable-detr
Deformable DETR for 2D object detection. Uses deformable attention for efficient multi-scale feature processing, lighter than DINO with competitive accuracy. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Deformable-D
nvidia
tao-train-depth-anything-v2
Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images. Use when training, evaluating, exporting, or running inference for a TAO monocular depth model. Trigger ph
nvidia
tao-train-dino
DINO (DETR with Improved DeNoising Anchor Boxes) for 2D object detection. Transformer-based detector with denoising training, multi-scale features, and optional distillation support. Use when training, evaluating, exporting, distilling, quantizing, or run
nvidia
tao-train-fast-foundation-stereo
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of FoundationStereo. Predicts disparity maps from stereo image pairs with ~10× lower latency than full FoundationStereo. Use when training, evaluating
nvidia
tao-train-foundation-stereo
Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D reconstruction. Use when training, evaluating, exporting, or running inference for a TAO FoundationStereo model. Trigger phrases include "train stereo d
nvidia
tao-train-grounding-dino
Grounding DINO for open-set object detection. Combines DINO-style detection with a BERT text encoder for language-guided detection — detects objects described by text prompts without a fixed class vocabulary. Use when training, evaluating, exporting, quan
nvidia
tao-train-image-classification
PyTorch-based TAO image classification. Supports a wide range of backbones (FAN, EfficientNet, ResNet, etc.) with distillation and quantization for deployment. Use when training, evaluating, distilling, quantizing, exporting, or running inference for a TA
nvidia
tao-train-mask-auto-encoder
Masked Auto-Encoder (MAE) for self-supervised pretraining and fine-tuning. Masks random patches and reconstructs them to learn visual representations; supports pretrain and finetune stages. Use when training, evaluating, exporting, or running inference fo
nvidia
tao-train-mask-auto-label
MAL (Mask Auto-Label) for weakly-supervised segmentation. Produces segmentation masks from minimal annotations (point or box annotations) using a ViT-MAE backbone. Use when training, evaluating, or running inference for a TAO MAL model. Trigger phrases in
nvidia
tao-train-mask-grounding-dino
Mask Grounding DINO for grounded instance segmentation. Extends Grounding DINO with a mask-prediction head for open-set segmentation guided by text prompts. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Mask-Groundin
nvidia
tao-train-mask2former
Mask2Former for universal image segmentation (panoptic, instance, and semantic). Transformer-based with masked attention for high-quality segmentation results. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Mask2Forme
nvidia
tao-train-metric-learning-recognition
Metric-learning recognition (ml-recog) for fine-grained visual recognition. Learns embeddings for retrieval-based matching (e.g., retail product recognition) using triplet / contrastive losses. Use when training, evaluating, exporting, or running inferenc
nvidia
tao-train-nvdinov2
NVDINOv2 for self-supervised visual representation learning. Trains vision transformers via self-distillation (teacher-student) without labels and produces general-purpose visual features. Use when training, distilling, exporting, or running inference for
nvidia
tao-train-nvpanoptix3d
NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation (semantic, instance, and panoptic masks) with occupancy completion. Built on a VGGT backbone with a Mask2Former-style head and 3D frustum reconstruc
nvidia
tao-train-ocdnet
OCDNet for scene text detection. Detects arbitrary-oriented text regions in natural images using a differentiable binarization approach. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCDNet model
nvidia
tao-train-ocrnet
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC and attention-based decoders. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCRNet mode
nvidia
tao-train-oneformer
OneFormer for universal image segmentation. Unifies panoptic, instance, and semantic segmentation with a single architecture using task-conditioned queries. Use when training, evaluating, exporting, quantizing, or running inference for a TAO OneFormer mod
nvidia
tao-train-optical-inspection
Optical Inspection for defect detection using Siamese networks. Compares image pairs to detect manufacturing defects, anomalies, or quality issues. Use when training, evaluating, exporting, or running inference for a TAO Optical Inspection model on AOI /
nvidia
tao-train-pointpillars
PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a pillar-based representation, then applies 2D detection — used in autonomous driving and robotics. Use when training, evaluating, exporting, prunin
nvidia
tao-train-pose-classification
Pose classification using ST-GCN (Spatial Temporal Graph Convolutional Network). Classifies skeleton sequences into action categories from pose-keypoint data. Use when training, evaluating, exporting, or running inference for a TAO pose-classification mod
nvidia
tao-train-reid
Person re-identification (ReID). Learns discriminative embeddings to match the same person across different camera views, based on metric learning. Use when training, evaluating, exporting, or running inference for a TAO person re-identification model. Tr
nvidia
tao-train-rtdetr
RT-DETR (Real-Time DEtection TRansformer) for 2D object detection. Designed for real-time inference with competitive accuracy and supports distillation and quantization for deployment optimization. Use when training, evaluating, distilling, quantizing, ex
nvidia
tao-train-segformer
SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature extraction, efficient for real-time segmentation tasks. Use when training, evaluating, exporting, quantizing, or running inference for a TAO SegForme
nvidia
tao-train-single-step
Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset without iterative data augmentation, AutoML, or DEFT loops. Trigger phrases include "single train run", "train then evaluate then export", "plain
nvidia
tao-train-sparse4d
Sparse4D for multi-camera temporal 3D object detection and tracking. Uses sparse queries with deformable attention across camera views and time for end-to-end 3D perception, with an instance bank for temporal tracking. Use when training, evaluating, expor
nvidia
tao-train-visual-changenet
Visual ChangeNet for binary image classification and segmentation in AOI defect detection. Use when training, evaluating, exporting, or running inference for PCB defect detection or visual inspection, comparing image pairs for PASS/NO_PASS classification,
nvidia
tao-validate-dataset-format
Run `tao-daft validate` to check NVIDIA TAO DAFT datasets for structure, schema, and cross-reference errors. Do not use for non-DAFT formats. Use when the user asks to validate a DAFT dataset, check DAFT schema, validate a TAO dataset format, or run `tao-
nvidia
tilegym-adding-cutile-kernel
Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or implementing a new cuTile operator/
nvidia
tilegym-converting-cutile-to-julia
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major), type system mapping, and launch API
nvidia
tilegym-converting-cutile-to-triton
Converts cuTile GPU kernels (@ct.kernel) to Triton (@triton.jit). Handles standard in-repo conversion, debugging (cudaErrorIllegalAddress, shape mismatch, numerical mismatch), and mapping cuTile idioms (ct.load/ct.store, ct.Constant, ct.launch) to Triton
nvidia
tilegym-cutile-autotuning
Use when adding, modifying, optimizing, or debugging CuTile autotuning code. Trigger signals: `exhaustive_search` / `replace_hints` / `hints_fn` / `cuda.tile.tune` in code, `autotune` in filenames, or correctness/performance issues in autotuned CuTile ker
nvidia
tilegym-cutile-python
Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.
nvidia
tilegym-improve-cutile-kernel-perf
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and I
nvidia
tilegym-monkey-patch-kernels-to-transformers
Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to instantiating models. Used when the
nvidia
vss-ask-video
Use this skill to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.
nvidia
vss-deploy-dense-captioning
Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion.
nvidia
vss-deploy-detection-tracking-2d
Use this skill when the user wants to deploy, run, debug, tear down, or call the REST API of the RTVI-CV 2D detection / tracking microservice. Trigger when the user says things like 'deploy rtvi-cv', 'start warehouse 2d', 'add a stream', 'check rtvi-cv he
nvidia
vss-deploy-detection-tracking-3d
Deploy and operate the RTVI-CV-3D microservice as MV3DT (`MODE=mv3dt`): per-camera DeepStream perception plus BEV Fusion over calibrated cameras. Supports the bundled sample dataset, custom video files, and RTSP streams, and chains to `vss-generate-video-
nvidia
vss-deploy-profile
Use to select, configure, deploy, verify, debug, or tear down a VSS profile (base, search, lvs, warehouse, edge). Not for standalone microservices — use the vss-deploy-* skill.
nvidia
vss-deploy-video-embedding
Use this skill when deploying, operating, or integrating the VSS 3.2 GA RT-Embed Video Embedding microservice. Covers Docker Compose bring-up, GPU and storage prerequisites, the `/v1` REST API (file uploads, text and video embeddings, live RTSP streams, h
nvidia
vss-generate-video-calibration
Use to run AutoMagicCalib on local MP4s, RTSP, or the bundled sample dataset, and to deploy vss-auto-calibration when needed. Do not use for non-AMC calibration or runtime analytics.
nvidia
vss-generate-video-report
Use this skill when producing a VSS analysis report — Mode A per-clip VLM, Mode B incident-range via video-analytics. Not for standalone video summarization, real-time alerts, or ad-hoc Q&A.
nvidia
vss-manage-alerts
Use for VSS alert workflows — real-time monitoring, Alert-Bridge subscriptions, Slack notifications, incident queries, camera onboarding. Not for non-alert analytics.
nvidia
vss-manage-video-io-storage
Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.
nvidia
vss-query-analytics
Use this skill when reading video-analytics metrics, incidents, alerts, and sensor data via the VA-MCP server (port 9901). Not for live VLM or incident-range narrative reports.
nvidia
vss-search-archive
Use to run top-level VSS fusion search on archived video, or to ingest video files / RTSP streams for search. Do NOT use for ad-hoc visual Q&A (use vss-ask-video), live captioning (use vss-deploy-dense-captioning), or video summarization and reports (use
nvidia
vss-setup-behavior-analytics
Use to deploy the vss-behavior-analytics service standalone (entrypoint, config-source, optional calibration). Not for the full warehouse deploy.
nvidia
vss-setup-video-analytics-api
Use to deploy the vss-video-analytics-api REST service standalone (config-source, data-log bind, Elasticsearch, optional Kafka). Not for full warehouse deploy.
nvidia
vss-summarize-video
Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.
nvidia
Build a Video Search and Summarization (VSS) Agent
Run the VSS Blueprint on your Spark
nvidia
Build and Deploy a Multi-Agent Chatbot
Deploy a multi-agent chatbot system and chat with agents on your Spark
nvidia
CLI Coding Agent
Build local CLI coding agents with Ollama
nvidia
Comfy UI
Install and use Comfy UI to generate images
nvidia
Connect Multiple DGX Spark through a Switch
Set up a cluster of DGX Spark devices that are connected through Switch
nvidia
Connect Three DGX Spark in a Ring Topology
Connect and set up three DGX Spark devices in a ring topology
nvidia
Connect Two Sparks
Connect two Spark devices and setup them up for inference and fine-tuning
nvidia
CUDA-X Data Science
Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes
nvidia
cuTile Kernels
Run cuTile kernel benchmarks, FMHA implementation, and LLM inference on DGX Spark and B300
nvidia
DGX Dashboard
Monitor your DGX system and launch JupyterLab
nvidia
DGX Station AI Skills for Coding Agents
Give your coding agent (Claude Code, Codex, Gemini CLI, Cursor) DGX Station expertise via an AGENTS.md and on-demand Agent Skills
nvidia
Fine-tune with NeMo
Use NVIDIA NeMo to fine-tune models locally
nvidia
Fine-tune with Pytorch
Use Pytorch to fine-tune models locally
nvidia
FLUX.1 Dreambooth LoRA Fine-tuning
Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation
nvidia
How to Build a Multi-GPU AI PC - A Practical Guide
Many people explore local generative AI for privacy and to avoid token limits, but newer models require significant memory and compute—leading some to adopt multi-GPU setups.
nvidia
How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth
Fine-tune popular AI models faster in Unsloth with NVIDIA RTX AI PCs, RTX PRO workstations, and DGX Spark—plus explore the new Nemotron Nano 3 family of open models.
nvidia
How to Get Started With Large Language Models on NVIDIA RTX PCs
Learn about using LLMs locally on PCs and workstations with Ollama, AnythingLLM, and LM Studio.
nvidia
How to Get Started With Visual Generative AI on NVIDIA RTX PCs
Learn how to run advanced image and video generation locally with ComfyUI and LTX-2 on RTX PCs.
nvidia
Image & Video Generation with ComfyUI
Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station
nvidia
Install and Use Isaac Sim and Isaac Lab
Build Isaac Sim and Isaac Lab from source for Spark
nvidia
Isaac GR00T N1.6 Fine-Tuning
Fine-tune and benchmark NVIDIA's GR00T N1.6 robotics foundation model on DGX Station
nvidia
Live VLM WebUI
Real-time Vision Language Model interaction with webcam streaming
nvidia
LLaMA Factory
Install and fine-tune models with LLaMA Factory
nvidia
LLM Inference with SGLang
Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance
nvidia
LM Studio on DGX Spark
Deploy LM Studio and serve LLMs on a Spark device; use LM Link to access models remotely.
nvidia
Local Coding Agent
Run local CLI coding agents with Ollama on DGX Station (NVIDIA GB300) using glm-4.7-flash (fast) or unsloth/GLM-4.7-GGUF:Q8_0 (best quality)
nvidia
Local Healthcare Agent on DGX Station
Run healthcare AI agents that analyze patient data and predict protein structures in an OpenShell sandbox on DGX Station
nvidia
MIG on DGX Station
Enable and configure Multi-Instance GPU (MIG) on DGX Station with GB300 Ultra (B300 GPUs)
nvidia
Multi-modal Inference
Setup multi-modal inference with TensorRT
nvidia
Nanochat on Dual-Spark
Setup Nanochat on Dual-Spark
nvidia
Nanochat Training
Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra
nvidia
NCCL for Two Sparks
Install and test NCCL on two Sparks
nvidia
Nemotron-3-Nano with llama.cpp
Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark
nvidia
NIM on Spark
Deploy a NIM on Spark
nvidia
NVFP4 Pretraining with Megatron Bridge
Pretrain Llama 3.1 8B with NVFP4 mixed precision on DGX Station using Megatron Bridge
nvidia
NVFP4 Quantization
Quantize a model to NVFP4 to run on DGX Station using TensorRT Model Optimizer
nvidia
NVFP4 Quantization
Quantize a model to NVFP4 to run on Spark using TensorRT Model Optimizer
nvidia
NVIDIA Video Generation Guide
Learn how to create videos using LTX-2 in ComfyUI, accelerated on RTX. Learn how to take control of visual generative AI, creating high resolution video on RTX.
nvidia
Open WebUI with Ollama
Install Open WebUI and use Ollama to chat with models on your Spark
nvidia
OpenClaw 🦞
Run OpenClaw locally on DGX Spark with a vLLM-served local model
nvidia
Optimized JAX
Optimize JAX to run on Spark
nvidia
Portfolio Optimization
GPU-Accelerated portfolio optimization using cuOpt and cuML
nvidia
Profiler-Driven Kernel Optimization for Fine-Tuning
Use torch.profiler to find training bottlenecks, then write custom Triton kernels to optimize LLaMA 8B fine-tuning
nvidia
RAG Application in AI Workbench
Install and use AI Workbench to clone and run a reproducible RAG application
nvidia
Register DGX Spark to Brev
Link your DGX Spark to Brev for remote access and shared environments
nvidia
Register DGX Station to Brev
Link your DGX Station to Brev for remote access and sharing
nvidia
Run Hermes Agent with Local Models
Install and run the Hermes self-improving AI agent on DGX Spark.
nvidia
Run models with llama.cpp on DGX Spark
Build llama.cpp with CUDA and serve models via an OpenAI-compatible API
nvidia
Run NemoClaw with a Local LLM
Build your first local AI assistant on DGX Station using NemoClaw in a secure sandbox, with optional Telegram.
nvidia
Run NemoClaw with a Local LLM
Build your first local AI assistant on DGX Spark using NemoClaw and Ollama in a secure sandbox, with optional Telegram.
nvidia
Run OpenClaw For Free On NVIDIA RTX GPUs & DGX Spark
Learn how to set up and host the popular AI agent using local inference apps optimized for RTX.
nvidia
Secure Long Running AI Agents with OpenShell on DGX Station
Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Station
nvidia
Secure Long Running AI Agents with OpenShell on DGX Spark
Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Spark
nvidia
Set Up Local Network Access
NVIDIA Sync helps set up and configure SSH access
nvidia
Set up Tailscale on Your Spark
Use Tailscale to connect to your Spark on your home network no matter where you are
nvidia
SGLang for Inference
Install and use SGLang on DGX Spark
nvidia
Single-cell RNA Sequencing
An end-to-end GPU-powered workflow for scRNA-seq using RAPIDS
nvidia
Spark & Reachy Photo Booth
AI augmented photo booth using the DGX Spark and Reachy Mini.
nvidia
Spark & Reachy Photo Booth
AI augmented photo booth using the DGX Spark and Reachy Mini.
nvidia
Speculative Decoding
Learn how to set up speculative decoding for fast inference on Spark
nvidia
Text to Knowledge Graph
Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
nvidia
Text to Knowledge Graph on DGX Station
Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
nvidia
Topic Modeling
Extract insights from massive text datasets using cuML's GPU-accelerated BERTopic
nvidia
TRT LLM for Inference
Install and use TensorRT-LLM on DGX Spark
nvidia
Unsloth on DGX Spark
Optimized fine-tuning with Unsloth
nvidia
Vibe Coding in VS Code
Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue
nvidia
Vision-Language Model Fine-tuning
Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3
nvidia
vLLM for Inference
Install and use vLLM on DGX Spark
nvidia
vLLM for Inference
Install and use vLLM on DGX Station
nvidia
vLLM for Inference
Install and use vLLM on NVIDIA RTX Pro 6000
nvidia
VS Code
Install and use VS Code locally or remotely
nvidia
🦞 Set Up Example NemoClaw Agents 🦞
Ready-to-run application examples for your NemoClaw sandbox — policy, prompt, and personalization for each workflow
nvidia
Build a Video Search and Summarization (VSS) Agent
Run the VSS Blueprint on your Spark
nvidia
Build and Deploy a Multi-Agent Chatbot
Deploy a multi-agent chatbot system and chat with agents on your Spark
nvidia
CLI Coding Agent
Build local CLI coding agents with Ollama
nvidia
Comfy UI
Install and use Comfy UI to generate images
nvidia
Connect Multiple DGX Spark through a Switch
Set up a cluster of DGX Spark devices that are connected through Switch
nvidia
Connect Three DGX Spark in a Ring Topology
Connect and set up three DGX Spark devices in a ring topology
nvidia
Connect Two Sparks
Connect two Spark devices and setup them up for inference and fine-tuning
nvidia
CUDA-X Data Science
Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes
nvidia
cuTile Kernels
Run cuTile kernel benchmarks, FMHA implementation, and LLM inference on DGX Spark and B300
nvidia
DGX Dashboard
Monitor your DGX system and launch JupyterLab
nvidia
DGX Station AI Skills for Coding Agents
Give your coding agent (Claude Code, Codex, Gemini CLI, Cursor) DGX Station expertise via an AGENTS.md and on-demand Agent Skills
nvidia
Fine-tune with NeMo
Use NVIDIA NeMo to fine-tune models locally
nvidia
Fine-tune with Pytorch
Use Pytorch to fine-tune models locally
nvidia
FLUX.1 Dreambooth LoRA Fine-tuning
Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation
nvidia
How to Build a Multi-GPU AI PC - A Practical Guide
Many people explore local generative AI for privacy and to avoid token limits, but newer models require significant memory and compute—leading some to adopt multi-GPU setups.
nvidia
How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth
Fine-tune popular AI models faster in Unsloth with NVIDIA RTX AI PCs, RTX PRO workstations, and DGX Spark—plus explore the new Nemotron Nano 3 family of open models.
nvidia
How to Get Started With Large Language Models on NVIDIA RTX PCs
Learn about using LLMs locally on PCs and workstations with Ollama, AnythingLLM, and LM Studio.
nvidia
How to Get Started With Visual Generative AI on NVIDIA RTX PCs
Learn how to run advanced image and video generation locally with ComfyUI and LTX-2 on RTX PCs.
nvidia
Image & Video Generation with ComfyUI
Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station
nvidia
Install and Use Isaac Sim and Isaac Lab
Build Isaac Sim and Isaac Lab from source for Spark
nvidia
Isaac GR00T N1.6 Fine-Tuning
Fine-tune and benchmark NVIDIA's GR00T N1.6 robotics foundation model on DGX Station
nvidia
Live VLM WebUI
Real-time Vision Language Model interaction with webcam streaming
nvidia
LLaMA Factory
Install and fine-tune models with LLaMA Factory
nvidia
LLM Inference with SGLang
Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance
nvidia
LM Studio on DGX Spark
Deploy LM Studio and serve LLMs on a Spark device; use LM Link to access models remotely.
nvidia
Local Coding Agent
Run local CLI coding agents with Ollama on DGX Station (NVIDIA GB300) using glm-4.7-flash (fast) or unsloth/GLM-4.7-GGUF:Q8_0 (best quality)
nvidia
Local Healthcare Agent on DGX Station
Run healthcare AI agents that analyze patient data and predict protein structures in an OpenShell sandbox on DGX Station
nvidia
MIG on DGX Station
Enable and configure Multi-Instance GPU (MIG) on DGX Station with GB300 Ultra (B300 GPUs)
nvidia
Multi-modal Inference
Setup multi-modal inference with TensorRT
nvidia
Nanochat on Dual-Spark
Setup Nanochat on Dual-Spark
nvidia
Nanochat Training
Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra
nvidia
NCCL for Two Sparks
Install and test NCCL on two Sparks
nvidia
Nemotron-3-Nano with llama.cpp
Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark
nvidia
NIM on Spark
Deploy a NIM on Spark
nvidia
NVFP4 Pretraining with Megatron Bridge
Pretrain Llama 3.1 8B with NVFP4 mixed precision on DGX Station using Megatron Bridge
nvidia
NVFP4 Quantization
Quantize a model to NVFP4 to run on DGX Station using TensorRT Model Optimizer
nvidia
NVFP4 Quantization
Quantize a model to NVFP4 to run on Spark using TensorRT Model Optimizer
nvidia
NVIDIA Video Generation Guide
Learn how to create videos using LTX-2 in ComfyUI, accelerated on RTX. Learn how to take control of visual generative AI, creating high resolution video on RTX.
nvidia
Open WebUI with Ollama
Install Open WebUI and use Ollama to chat with models on your Spark
nvidia
OpenClaw 🦞
Run OpenClaw locally on DGX Spark with a vLLM-served local model
nvidia
Optimized JAX
Optimize JAX to run on Spark
nvidia
Portfolio Optimization
GPU-Accelerated portfolio optimization using cuOpt and cuML
nvidia
Profiler-Driven Kernel Optimization for Fine-Tuning
Use torch.profiler to find training bottlenecks, then write custom Triton kernels to optimize LLaMA 8B fine-tuning
nvidia
RAG Application in AI Workbench
Install and use AI Workbench to clone and run a reproducible RAG application
nvidia
Register DGX Spark to Brev
Link your DGX Spark to Brev for remote access and shared environments
nvidia
Register DGX Station to Brev
Link your DGX Station to Brev for remote access and sharing
nvidia
Run Hermes Agent with Local Models
Install and run the Hermes self-improving AI agent on DGX Spark.
nvidia
Run models with llama.cpp on DGX Spark
Build llama.cpp with CUDA and serve models via an OpenAI-compatible API
nvidia
Run NemoClaw with a Local LLM
Build your first local AI assistant on DGX Station using NemoClaw in a secure sandbox, with optional Telegram.
nvidia
Run NemoClaw with a Local LLM
Build your first local AI assistant on DGX Spark using NemoClaw and Ollama in a secure sandbox, with optional Telegram.
nvidia
Run OpenClaw For Free On NVIDIA RTX GPUs & DGX Spark
Learn how to set up and host the popular AI agent using local inference apps optimized for RTX.
nvidia
Secure Long Running AI Agents with OpenShell on DGX Station
Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Station
nvidia
Secure Long Running AI Agents with OpenShell on DGX Spark
Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Spark
nvidia
Set Up Local Network Access
NVIDIA Sync helps set up and configure SSH access
nvidia
Set up Tailscale on Your Spark
Use Tailscale to connect to your Spark on your home network no matter where you are
nvidia
SGLang for Inference
Install and use SGLang on DGX Spark
nvidia
Single-cell RNA Sequencing
An end-to-end GPU-powered workflow for scRNA-seq using RAPIDS
nvidia
Spark & Reachy Photo Booth
AI augmented photo booth using the DGX Spark and Reachy Mini.
nvidia
Spark & Reachy Photo Booth
AI augmented photo booth using the DGX Spark and Reachy Mini.
nvidia
Speculative Decoding
Learn how to set up speculative decoding for fast inference on Spark
nvidia
Text to Knowledge Graph
Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
nvidia
Text to Knowledge Graph on DGX Station
Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
nvidia
Topic Modeling
Extract insights from massive text datasets using cuML's GPU-accelerated BERTopic
nvidia
TRT LLM for Inference
Install and use TensorRT-LLM on DGX Spark
nvidia
Unsloth on DGX Spark
Optimized fine-tuning with Unsloth
nvidia
Vibe Coding in VS Code
Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue
nvidia
Vision-Language Model Fine-tuning
Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3
nvidia
vLLM for Inference
Install and use vLLM on DGX Spark
nvidia
vLLM for Inference
Install and use vLLM on DGX Station
nvidia
vLLM for Inference
Install and use vLLM on NVIDIA RTX Pro 6000
nvidia
VS Code
Install and use VS Code locally or remotely
nvidia
🦞 Set Up Example NemoClaw Agents 🦞
Ready-to-run application examples for your NemoClaw sandbox — policy, prompt, and personalization for each workflow