WestDRI: webinar "Distributed datasets with DataLad"
Date: 28 March 2023 @ 18:00 - 19:00
Timezone: UTC
Langue d'enseignement: Anglais
Register
Last month we hosted a workshop "Data management with DataLad" that demonstrated several HPC workflows with Git-based data organization and analysis. On March 28th I will take a step back and provide a more beginner-oriented tutorial to version control of large data files with DataLad.
I will start with a textbook introduction to DalaLad showing its main features on top of Git and git-annex. Next I will try to demonstrate several simple but useful workflows:
- two users on a shared cluster filesystem working with the same dataset stored in /project,
- one user, one dataset spread over multiple drives, with data redundancy,
- publishing a dataset on GitHub with annexed files in a special private remote,
- publishing a dataset on GitHub with publicly-accessible annexed files on the Alliance's Nextcloud, and
- (if we have time) managing multiple Git repositories under one dataset.
Keywords: Git, Programming
Activity log