Data Bites - Introduction to Programmatic Data De-identification with R
Note: all times are shown in the timezone in which each event occurs.
Date: 15 July 2026 @ 10:00 - 11:00
Timezone: Pacific Daylight Time
Language of instruction: English
Workshop: Programmatic Data De-identification with R
This practical workshop, delivered by the UBC Library Research Data Management team, introduces programmatic approaches to de-identifying sensitive research data in R. Through hands-on exercises using a realistic survey dataset, participants will apply a structured workflow, from assessing privacy risks to exporting a shareable, de-identified dataset.
Participants will learn how to:
- Identify privacy risks in research data, including direct identifiers, dates, geographic variables, and free-text fields.
- Apply de-identification methods in R using dplyr, including removal, generalization, suppression, anonymization, and pseudonymization.
- Run quality assurance checks to confirm a dataset is sufficiently de-identified before sharing.
- Export a de-identified dataset and a data key file, and understand best practices for securely storing each.
To participate fully, you will need to install the latest versions of R and RStudio on your computer before the workshop:
- Install R from https://cran.rstudio.com/
- Install RStudio from https://rstudio.com/products/rstudio/download/#download
Note: This workshop provides a practical introduction to programmatic data de-identification. Participants are encouraged to consult their institutional privacy, legal, or compliance experts for guidance on specific datasets.
Location: ONLINE
(A Zoom link will be sent to registrants 3 hours before the event starts.)
Contact: https://libcal.library.ubc.ca/profile/32798
Keywords: Data, Digital Scholarship, Research Commons, Research Data Management
Organizer: Eugene Barsky
Activity log
