Adding ignore list to skim load for unused skims while running with sharrow#1036
Adding ignore list to skim load for unused skims while running with sharrow#1036dhensle wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #1035 where, with sharrow enabled and a clean cache, the load=False shared-memory reload path attempted to reload all OMX matrices (including those previously dropped as unused), causing a KeyError. It does so by extracting the “finalization” phase into a helper and expanding the ignore list passed to reload_from_omx_3d to include dropped matrices, plus adding targeted tests around the reload/finalization path.
Changes:
- Extract final skim dataset alignment/shared-memory/encoding behavior into
_finalize_skim_dataset. - Extend the
reload_from_omx_3d(..., ignore=...)patterns to also ignore OMX matrices that were dropped from the dataset as unused. - Add comprehensive unit tests that build small OMX files and exercise reload, dropped-skim handling, digital encoding ordering, and zone alignment behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
activitysim/core/skim_dataset.py |
Adds _finalize_skim_dataset, defers encoding to the finalization stage, and builds an extended ignore list so dropped OMX matrices aren’t reloaded into shared memory. |
activitysim/core/test/test_skim.py |
Adds new unit tests that construct OMX fixtures and validate shared-memory reload behavior (including dropped skims) and encoding/zone alignment outcomes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info( | ||
| "store_skims_in_shm is False, keeping skims in process-local memory" | ||
| ) | ||
| for f in omx_file_handles: | ||
| f.close() | ||
| d = _apply_digital_encoding(d, skim_digital_encoding) | ||
| return d |
There was a problem hiding this comment.
@dhensle this is a legit point, can you consider/address? I think leaving file handles dangling open indefinitely isn't the best, but it might be a better solution than closing them prematurely. Computing the dataset to force-load everything isn't ideal because we might not need everything -- especially if we're not running a full end-to-end model run.
Fix for #1035