NVIDIA catalog
Models, skills and blueprints for GPU jobs.
Browse NVIDIA workloads inside ICPX before creating a compute job.
nvidia
Active Speaker Detection
Detect and track speaker identities across video frames.
deepmind
alphafold2
Predicts the 3D structure of a protein from its amino acid sequence.
deepmind
alphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.
sqwh1lyrveic
AODT 1.2.1
AODT 1.2.1
sqwh1lyrveic
AODT 1.2.2
AODT 1.2.2
nvidia
Background Noise Removal
Removes unwanted noises from audio improving speech intelligibility.
nvidia
bevformer
Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
baai
bge-m3
Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
mit
Boltz-2
Predict complex structures using Boltz-2.
nvidia
canary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.
resembleai
chatterbox-multilingual-tts
Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
nvidia
conformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
nvidia
cosmos-reason2-8b
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cosmos-transfer1-7b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos-transfer2.5-2b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos3-nano
Generates physics-aware videos from text prompts or an image prompt for physical AI development.
nvidia
cosmos3-nano-reasoner
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cuopt
World-record accuracy and performance for complex route optimization.
deepseek-ai
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
deepseek-ai
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
mit
diffdock
Predicts the 3D structure of how a molecule interacts with a protein.
diffusiongemma-26b-a4b-it
Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
abacusai
dracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
meta
esm2-650m
Generates embeddings of proteins from their amino acid sequences.
meta
esmfold
Predicts the 3D structure of a protein from its amino acid sequence.
arc
evo2-40b
Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
nvidia
eyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.
cadence
fidelity
Run computational-fluid dynamics (CFD) simulations
ansys
fluent
Run computational-fluid dynamics (CFD) simulations
black-forest-labs
FLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
black-forest-labs
FLUX.1-Kontext-dev
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
black-forest-labs
FLUX.1-schnell
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
black-forest-labs
flux.2-klein-4b
FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
nvidia
fourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
gemma-2-2b-it
Advanced small language generative AI model for edge applications
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
nvidia
genmol
Fragment-Based Molecular Generation by Discrete Diffusion.
nvidia
gliner-pii
GLiNER PII detects Personally Identifiable Information in text.
z-ai
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
openai
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
openai
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
nvidia
ising-calibration-1-35b-a3b
Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
moonshotai
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
nvidia
LipSync
Generative lip dubbing that syncs lips in a video to input audio.
meta
llama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
meta
llama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
nvidia
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nvidia
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nvidia
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
nvidia
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
nvidia
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
meta
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
meta
llama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
meta
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
nvidia
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
nvidia
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
meta
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
meta
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
nvidia
llama-nemotron-embed-1b-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
nvidia
llama-nemotron-embed-vl-1b-v2
Multimodal question-answer retrieval representing user queries as text and documents as images.
nvidia
llama-nemotron-rerank-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
llama-nemotron-rerank-vl-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
magpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
nvidia
magpie-tts-zeroshot
Expressive and engaging text-to-speech, generated from a short audio sample.
nvidia
megatron-1b-nmt
Enable smooth global interactions in 36 languages.
minimaxai
minimax-m2.7
MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
minimaxai
minimax-m3
MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
mistralai
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
mistralai
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
mistralai
mistral-medium-3.5-128b
A high performing model for text generation, coding and agentic use cases
mistralai
mistral-nemotron
Built for agentic workflows, this model excels in coding, instruction following, and function calling
mistralai
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
mistralai
mixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
nvidia
molmim
MolMIM performs controlled generation, finding molecules with the right properties.
colabfold
msa-search
Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
nvidia
nemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nvidia
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemoretriever-page-elements-v2
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-3-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
nvidia
nemotron-3-nano-omni-30b-a3b-reasoning
Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
nvidia
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3-ultra-550b-a55b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3.5-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-asr-streaming
Real-time speech recognition for English
nvidia
nemotron-content-safety-reasoning-4b
A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
nvidia
nemotron-graphic-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
nvidia
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
nvidia
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemotron-page-elements-v3
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-table-structure-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-voicechat
Nemotron 3 Voicechat
nvidia
nv-embed-v1
Generates high-quality numerical embeddings from text inputs.
nvidia
nv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
nvidia
nv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
nvidia
nv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
openfold
openfold2
Predicts the 3D structure of a protein from its amino acid sequence, multiple sequence alignments, and templates.
openfold
openfold3
OpenFold3 is a third-generation biomolecular foundation model that predicts the three-dimensional structures of molecular complexes (proteins, DNA, RNA, ligands)
baidu
paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
nvidia
parakeet-1.1b-rnnt-multilingual-asr
High accuracy and optimized performance for transcription in 25 languages
nvidia
parakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.
nvidia
parakeet-ctc-0.6b-es
Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-vi
Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-zh-cn
Record-setting accuracy and performance for Mandarin English transcriptions.
nvidia
parakeet-ctc-0.6b-zh-tw
Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.
nvidia
parakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.
nvidia
parakeet-tdt-0.6b-v2
Accurate and optimized English transcriptions with punctuation and word timestamps
microsoft
phi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoft
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
ipd
proteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
qwen
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
qwen
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
qwen
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
qwen
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
qwen
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
nvidia
Relighting
Re-illuminate people in video to match target lighting from a 360 HDRI environment map.
nvidia
rerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
ipd
rfdiffusion
A generative model of protein backbones for protein binder design.
nvidia
riva-translate-1.6b
Enable smooth global interactions in 36 languages.
nvidia
riva-translate-4b-instruct-v1_1
Translation model in 12 languages with few-shots example prompts capability.
sarvamai
sarvam-m
Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
bytedance
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
siemens
simcenter-star-ccm+
Run computational-fluid dynamics (CFD) simulations
upstage
solar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
nvidia
sparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
cadence
spectre-x
Run large-scale electronics and chip design verification simulations
stabilityai
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
stepfun-ai
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
stepfun-ai
step-3.7-flash
A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.
stockmark
stockmark-2-100b-instruct
Japanese-specialized large-language-model for enterprises to read and understand complex business documents.
nvidia
streampetr
StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
nvidia
Studio Voice
Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
nvidia
synthetic-video-detector
NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI‑generated (synthetic) videos.
0615409268808334
test_endpoint_20251218_133732_563_ouy_canary
For publishing test
microsoft
TRELLIS
MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
nvidia
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
openai
whisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.
nvidia
Active Speaker Detection
Detect and track speaker identities across video frames.
deepmind
alphafold2
Predicts the 3D structure of a protein from its amino acid sequence.
deepmind
alphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.
sqwh1lyrveic
AODT 1.2.1
AODT 1.2.1
sqwh1lyrveic
AODT 1.2.2
AODT 1.2.2
nvidia
Background Noise Removal
Removes unwanted noises from audio improving speech intelligibility.
nvidia
bevformer
Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
baai
bge-m3
Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
mit
Boltz-2
Predict complex structures using Boltz-2.
nvidia
canary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.
resembleai
chatterbox-multilingual-tts
Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
nvidia
conformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
nvidia
cosmos-reason2-8b
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cosmos-transfer1-7b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos-transfer2.5-2b
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
nvidia
cosmos3-nano
Generates physics-aware videos from text prompts or an image prompt for physical AI development.
nvidia
cosmos3-nano-reasoner
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
nvidia
cuopt
World-record accuracy and performance for complex route optimization.
deepseek-ai
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
deepseek-ai
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
mit
diffdock
Predicts the 3D structure of how a molecule interacts with a protein.
diffusiongemma-26b-a4b-it
Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
abacusai
dracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
meta
esm2-650m
Generates embeddings of proteins from their amino acid sequences.
meta
esmfold
Predicts the 3D structure of a protein from its amino acid sequence.
arc
evo2-40b
Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
nvidia
eyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.
cadence
fidelity
Run computational-fluid dynamics (CFD) simulations
ansys
fluent
Run computational-fluid dynamics (CFD) simulations
black-forest-labs
FLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
black-forest-labs
FLUX.1-Kontext-dev
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
black-forest-labs
FLUX.1-schnell
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
black-forest-labs
flux.2-klein-4b
FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
nvidia
fourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
gemma-2-2b-it
Advanced small language generative AI model for edge applications
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
nvidia
genmol
Fragment-Based Molecular Generation by Discrete Diffusion.
nvidia
gliner-pii
GLiNER PII detects Personally Identifiable Information in text.
z-ai
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
openai
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
openai
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
nvidia
ising-calibration-1-35b-a3b
Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
moonshotai
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
nvidia
LipSync
Generative lip dubbing that syncs lips in a video to input audio.
meta
llama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
meta
llama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
nvidia
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nvidia
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nvidia
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
nvidia
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
nvidia
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
meta
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
meta
llama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
meta
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
nvidia
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
nvidia
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
meta
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
meta
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
nvidia
llama-nemotron-embed-1b-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
nvidia
llama-nemotron-embed-vl-1b-v2
Multimodal question-answer retrieval representing user queries as text and documents as images.
nvidia
llama-nemotron-rerank-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
llama-nemotron-rerank-vl-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nvidia
magpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
nvidia
magpie-tts-zeroshot
Expressive and engaging text-to-speech, generated from a short audio sample.
nvidia
megatron-1b-nmt
Enable smooth global interactions in 36 languages.
minimaxai
minimax-m2.7
MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
minimaxai
minimax-m3
MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
mistralai
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
mistralai
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
mistralai
mistral-medium-3.5-128b
A high performing model for text generation, coding and agentic use cases
mistralai
mistral-nemotron
Built for agentic workflows, this model excels in coding, instruction following, and function calling
mistralai
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
mistralai
mixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
nvidia
molmim
MolMIM performs controlled generation, finding molecules with the right properties.
colabfold
msa-search
Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
nvidia
nemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nvidia
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemoretriever-page-elements-v2
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-3-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
nvidia
nemotron-3-nano-omni-30b-a3b-reasoning
Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
nvidia
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3-ultra-550b-a55b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
nvidia
nemotron-3.5-content-safety
Multilingual, multimodal model for detecting unsafe and toxic content.
nvidia
nemotron-asr-streaming
Real-time speech recognition for English
nvidia
nemotron-content-safety-reasoning-4b
A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
nvidia
nemotron-graphic-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
nvidia
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
nvidia
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
nvidia
nemotron-page-elements-v3
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
nvidia
nemotron-table-structure-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nemotron-voicechat
Nemotron 3 Voicechat
nvidia
nv-embed-v1
Generates high-quality numerical embeddings from text inputs.
nvidia
nv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
nvidia
nv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
nvidia
nv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidia
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
openfold
openfold2
Predicts the 3D structure of a protein from its amino acid sequence, multiple sequence alignments, and templates.
openfold
openfold3
OpenFold3 is a third-generation biomolecular foundation model that predicts the three-dimensional structures of molecular complexes (proteins, DNA, RNA, ligands)
baidu
paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
nvidia
parakeet-1.1b-rnnt-multilingual-asr
High accuracy and optimized performance for transcription in 25 languages
nvidia
parakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.
nvidia
parakeet-ctc-0.6b-es
Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-vi
Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.
nvidia
parakeet-ctc-0.6b-zh-cn
Record-setting accuracy and performance for Mandarin English transcriptions.
nvidia
parakeet-ctc-0.6b-zh-tw
Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.
nvidia
parakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.
nvidia
parakeet-tdt-0.6b-v2
Accurate and optimized English transcriptions with punctuation and word timestamps
microsoft
phi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoft
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
ipd
proteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
qwen
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
qwen
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
qwen
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
qwen
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
qwen
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
nvidia
Relighting
Re-illuminate people in video to match target lighting from a 360 HDRI environment map.
nvidia
rerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
ipd
rfdiffusion
A generative model of protein backbones for protein binder design.
nvidia
riva-translate-1.6b
Enable smooth global interactions in 36 languages.
nvidia
riva-translate-4b-instruct-v1_1
Translation model in 12 languages with few-shots example prompts capability.
sarvamai
sarvam-m
Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
bytedance
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
siemens
simcenter-star-ccm+
Run computational-fluid dynamics (CFD) simulations
upstage
solar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
nvidia
sparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
cadence
spectre-x
Run large-scale electronics and chip design verification simulations
stabilityai
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
stepfun-ai
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
stepfun-ai
step-3.7-flash
A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.
stockmark
stockmark-2-100b-instruct
Japanese-specialized large-language-model for enterprises to read and understand complex business documents.
nvidia
streampetr
StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
nvidia
Studio Voice
Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
nvidia
synthetic-video-detector
NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI‑generated (synthetic) videos.
0615409268808334
test_endpoint_20251218_133732_563_ouy_canary
For publishing test
microsoft
TRELLIS
MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
nvidia
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
openai
whisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.