Name: Data Parallelism and Model Parallelism for Scaling Training Across Multiple GPUs
Start: 2025-06-13 17:30:00 UTC
End: 2025-06-13 20:30:00 UTC

View event

Date: 13 June 2025 @ 13:30 - 16:30

Timezone: Eastern Time (US & Canada)

Language of instruction: English

DESCRIPTION: Larger Deep Neural Networks (DNNs) are typically more powerful, but training models across multiple GPUs or multiple nodes isn't trivial and requires a an understanding of both AI and high-performance computing (HPC). In this workshop we will give an overview of activation checkpointing, gradient accumulation, and various forms of data and model parallelism to overcome the challenges associated with large-model memory footprint, and walk through some examples.

TEACHER: Jonathan Dursi (NVIDIA)

LEVEL: Intermediate/Advanced

FORMAT: Lecture + Demo

CERTIFICATE: Attendance

PREREQUISITES:

Familiarity with training models in Pytorch on a single GPU will be assumed.

Keywords: TRAINING

Activity log

Content provider

Data Parallelism and Model Parallelism for Scaling Training Across Multiple GPUs