Date: 28 March 2023 @ 18:00 - 19:00

Timezone: UTC

Language of instruction: English

Register 

 

Last month we hosted a workshop "Data management with DataLad" that demonstrated several HPC workflows with Git-based data organization and analysis. On March 28th I will take a step back and provide a more beginner-oriented tutorial to version control of large data files with DataLad.

I will start with a textbook introduction to DalaLad showing its main features on top of Git and git-annex. Next I will try to demonstrate several simple but useful workflows:

  1. two users on a shared cluster filesystem working with the same dataset stored in /project,
  2. one user, one dataset spread over multiple drives, with data redundancy,
  3. publishing a dataset on GitHub with annexed files in a special private remote,
  4. publishing a dataset on GitHub with publicly-accessible annexed files on the Alliance's Nextcloud, and
  5. (if we have time) managing multiple Git repositories under one dataset.

Keywords: Git, Programming


Activity log