Data Parallelism and Model Parallelism for Scaling Training Across Multiple GPUs
Date: 13 June 2025 @ 13:30 - 16:30
Timezone: Eastern Time (US & Canada)
Language of instruction: English
DESCRIPTION: Larger Deep Neural Networks (DNNs) are typically more powerful, but training models across multiple GPUs or multiple nodes isn't trivial and requires a an understanding of both AI and high-performance computing (HPC). In this workshop we will give an overview of activation checkpointing, gradient accumulation, and various forms of data and model parallelism to overcome the challenges associated with large-model memory footprint, and walk through some examples.
TEACHER: Jonathan Dursi (NVIDIA)
LEVEL: Intermediate/Advanced
FORMAT: Lecture + Demo
CERTIFICATE: Attendance
PREREQUISITES:
- Familiarity with training models in Pytorch on a single GPU will be assumed.
Keywords: TRAINING
Activity log