Skip to content

fix(caches): bound _curation_results and add npz mtime to image-label key#263

Open
lstein wants to merge 1 commit into
masterfrom
lstein/fix/bounded-caches
Open

fix(caches): bound _curation_results and add npz mtime to image-label key#263
lstein wants to merge 1 commit into
masterfrom
lstein/fix/bounded-caches

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented May 22, 2026

Summary

Two in-memory caches accumulate per-job / per-image entries on long-running servers:

  • `_curation_results` in `routers/curation.py` — one entry per curation job_id, written from the background task and read by the poll endpoint. Fully unbounded; grew forever.
  • `_IMAGE_LABEL_CACHE` in `cluster_labels.py` — was already bounded inline with an OrderedDict + lock + max=1024, BUT the cache key was `(embeddings_path, sorted_index, vocab_mtime)` — no embeddings .npz mtime. Re-indexing an album (which can reshuffle the raw-row → sorted-index mapping) left stale labels in place until the vocab file was touched.

What changes

  1. Adds `BoundedLRU[K, V]` to `util.py` — a thread-safe LRU with `get` / `put` / `clear`, capacity-bounded. Replaces the ad-hoc OrderedDict + lock + popitem pattern.

  2. Migrates `_curation_results` to `BoundedLRU(maxsize=64)` — 64 is well above any realistic in-flight + recently-polled working set (curation jobs complete in seconds; the frontend polls each job_id once or twice). The dedicated `_curation_results_lock` is gone — `BoundedLRU` does its own locking.

  3. Migrates `_IMAGE_LABEL_CACHE` to the new helper, drops the inline get/put helpers, and adds the .npz mtime to the cache key so re-indexing an album naturally invalidates its image labels.

Behavior preserved

  • The curation polling endpoint still returns the result by job_id — until LRU eviction. Previously the entry persisted forever; now a job whose result hasn't been polled for ~64 newer curations falls out and the endpoint returns 404 (which the frontend already handles as the no-result path).
  • The image-label cache still returns the same labels for the same image until either the embeddings .npz changes (new behavior — fixed stale-after-reindex) or the vocab file changes (previous behavior, unchanged).

Test plan

  • `ruff check photomap tests` — clean
  • `pytest tests/backend` — 256 passed
  • `test_compute_image_label_cache_evicts_past_max` was monkey-patching a deleted `_IMAGE_LABEL_CACHE_MAX` constant — updated to swap in a tiny `BoundedLRU(maxsize=3)` via `monkeypatch.setattr` so the cap stays observable in a few iterations
  • Manual: kick off > 64 curation jobs (any small album, K-means with target_count=1 each) and confirm only the most-recent 64 results are pollable
  • Manual: open the metadata drawer on an image to populate the label cache, re-index the album, open the drawer on the same image — confirm the new label reflects the re-indexed embedding (was previously stuck on the pre-reindex label until `vocab.txt` was modified)

Net +44 lines across 4 files. `util.py` picks up ~54 lines for the helper + JSDoc-style docstring; `cluster_labels.py` and `curation.py` both net negative because the inline LRU plumbing is gone.

🤖 Generated with Claude Code

… key

Two in-memory caches accumulate per-job/per-image entries on long-running
servers:

* ``_curation_results`` in ``routers/curation.py`` — one entry per
  curation job_id, written from the background task and read by the
  poll endpoint. Fully unbounded.

* ``_IMAGE_LABEL_CACHE`` in ``cluster_labels.py`` — already had an
  inline OrderedDict + lock + handcrafted LRU eviction with a max of
  1024, BUT the cache key was
  ``(embeddings_path, sorted_index, vocab_mtime)`` — no embeddings
  ``.npz`` mtime. Re-indexing an album (which can reshuffle the raw
  row → sorted-index mapping) left stale labels in place until the
  vocab file was touched.

This change:

1. Adds ``BoundedLRU[K, V]`` to ``util.py`` — a thread-safe LRU with
   ``get`` / ``put`` / ``clear``, capacity-bounded. Replaces the
   ad-hoc OrderedDict pattern.

2. Migrates ``_curation_results`` to ``BoundedLRU(maxsize=64)`` — 64 is
   well above any realistic in-flight + recently-polled working set
   (curation jobs complete in seconds; the frontend polls each job_id
   once or twice). The dedicated ``_curation_results_lock`` is gone —
   BoundedLRU does its own locking.

3. Migrates ``_IMAGE_LABEL_CACHE`` to the new helper, drops the inline
   helpers, and adds the ``.npz`` mtime to the cache key so re-indexing
   an album naturally invalidates its image labels.

Behavior preserved end-to-end — the curation polling endpoint still
returns the result by job_id (until LRU eviction; previously the entry
persisted forever), and the image-label cache still returns the same
labels for the same image until either the embeddings .npz changes (new
behavior — fixed stale-after-reindex) or the vocab file changes
(previous behavior, unchanged).

The matching ``test_compute_image_label_cache_evicts_past_max`` was
patching a now-deleted module-level ``_IMAGE_LABEL_CACHE_MAX`` constant;
updated to swap in a tiny ``BoundedLRU(maxsize=3)`` via
``monkeypatch.setattr`` so the cap is observable in a few iterations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant