<aside> 🐨

Assignment 02 - January 21, 2026 https://classroom.github.com/a/fgf2ebz9

</aside>

Links & Self-Guided Review

Why Memory Limits Sneak Up On Us

Dataset vs laptop memory

Dataset vs laptop memory

Chart shows estimated in-memory size; raw on-disk sizes are in the table below.

Health datasets outgrow laptop RAM quickly: a handful of CSVs with vitals, labs, and encounters can exceed 16 GB once loaded. Attempting to “just read the file” leads to system thrash, swap usage, and eventually Python MemoryErrors that interrupt the workflow.

Laptop specs vs dataset footprints

Dataset Typical raw size In-memory pandas size Fits on 16 GB laptop?
Intake forms (CSV) 250 MB ~1.2 GB (due to dtype inflation)
Longitudinal vitals (CSV) 6 GB ~14 GB ⚠️ borderline
EHR encounter log (CSV) 18 GB ~42 GB
Imaging metadata (Parquet) 9 GB ~9 GB ⚠️ if other apps closed
Claims archive (partitioned Parquet) 120 GB streamed ✅ (with streaming)

Warning signs you are hitting RAM limits