Catalog

Blueprint by nvidia

Profiler-Driven Kernel Optimization for Fine-Tuning

Use torch.profiler to find training bottlenecks, then write custom Triton kernels to optimize LLaMA 8B fine-tuning

NVIDIA blueprintTrainingFine-TuningPerformance OptimizationKernel DevelopmentDGX StationLLaMA