Date: 27 April 2023 @ 21:30 - 23:00

Timezone: UTC

Language of instruction: English

Register

 

PyTables is a free and open-source Python library for managing large hierarchical datasets. It is built on top of numpy and the HDF5 scientific dataset library, and it focuses both on performance and interactive analysis of very large datasets.

For large data streams (think multi-dimensional arrays or billions of records) it outperforms databases in terms of speed, memory usage and I/O bandwidth, although it is not a replacement to traditional relational databases as PyTables does not support broad relationships between dataset variables.

PyTables can be even used to organize a workflow with many (thousands to millions) of small files, as you can create a PyTables database of nodes that can be used like regular opened files in Python. This lets you store a large number of arbitrary files in a PyTables database with on-the-fly compression, making it very efficient for handling huge amounts of data.

This workshop will guide you through the basics with no previous PyTables or HDF5 knowledge.

Keywords: Python, Programming


Activity log