Catalog
Blueprint by nvidia
Profiler-Driven Kernel Optimization for Fine-Tuning
Use torch.profiler to find training bottlenecks, then write custom Triton kernels to optimize LLaMA 8B fine-tuning
NVIDIA blueprintTrainingFine-TuningPerformance OptimizationKernel DevelopmentDGX StationLLaMA