Skip to content

geotiff: reject bool and unresolvable EPSG in writer crs= (#1971)#1978

Open
brendancol wants to merge 1 commit into
mainfrom
issue-1971
Open

geotiff: reject bool and unresolvable EPSG in writer crs= (#1971)#1978
brendancol wants to merge 1 commit into
mainfrom
issue-1971

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1971.

Summary

  • crs=True / crs=False are now rejected at the writer entry point with a clear ValueError. bool is an int subclass, so the previous isinstance(crs, int) check accepted them and wrote EPSG=1 / EPSG=0 to the file.
  • Integer crs= values are now validated via pyproj.CRS.from_epsg before they reach the GeoTIFF tags. An int that does not resolve as a CRS raises ValueError instead of writing garbage and emitting a GeoTIFFFallbackWarning after the fact.
  • The validator lives in _crs.py as _validate_crs_arg and is called from to_geotiff (eager), _write_vrt_tiled, write_geotiff_gpu, and _resolve_crs_to_wkt (used by write_vrt), so every writer entry point shares one rejection path.
  • Pyproj is optional; the EPSG-resolves check is skipped when pyproj is not installed, matching the rest of the module.

Test plan

  • New xrspatial/geotiff/tests/test_crs_arg_validation_1971.py covers _validate_crs_arg directly and the three writer entry points (eager, GPU import-only-checked, .vrt path).
  • pytest xrspatial/geotiff/tests/ -k "crs or epsg" — 151 passed, no regressions.
  • Full pytest xrspatial/geotiff/tests/ — same 8 pre-existing failures as main (GPU predictor tests + a tile_size unit test), none related to this change.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
…1971)

bool is an int subclass, so crs=True / crs=False slipped through
isinstance(crs, int) and wrote EPSG=1 / EPSG=0 to the file. Integer
EPSG codes were also written without a pyproj round-trip, so any int
that did not resolve to a known CRS produced a file with garbage in
ProjectedCSType / GeographicType (and only a GeoTIFFFallbackWarning
to flag it).

Add _validate_crs_arg in _crs.py: rejects bool with a clear message,
validates int via pyproj.CRS.from_epsg when pyproj is available, and
leaves str / None alone. Call it from to_geotiff, write_geotiff_gpu,
_write_vrt_tiled, and _resolve_crs_to_wkt so every writer entry
point shares one validation path.
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review (self-review)

Blockers

  • data.attrs['crs'] path bypasses the validator. Lines _writers/eager.py:534, _writers/gpu.py:370, and _writers/eager.py:829 (the _write_vrt_tiled branch) do elif crs_attr is not None: epsg = int(crs_attr). Passing attrs={'crs': True} and calling to_geotiff(da, path) with no explicit crs= kwarg still writes EPSG=1. Verified locally: round-trips with attrs['crs'] == 1. The validator needs to run on crs_attr after the isinstance(crs_attr, str) branch, not only on the kwarg.

Suggestions

  • np.int64(4326) / np.integer scalars now raise TypeError from _validate_crs_arg because isinstance(np.int64(...), int) is False. Before this PR they silently fell through to "no EPSG written" — different behaviour, possibly a regression for users with numpy-typed CRS values. Either accept np.integer (and coerce to int) or document the change. Same applies inside _resolve_crs_to_wkt.
  • Reader-side _geotags.py:557 was called out in the original report and is not touched here. Files written by other tools can still produce attrs['crs'] = <bad int> on read. Acceptable to leave for a follow-up, but worth noting in the PR body so reviewers don't expect closure of that gap.

Nits

  • _resolve_crs_to_wkt now runs _validate_crs_arg and then redoes the EPSG-resolves check via CRS.from_epsg(crs).to_wkt(). Two pyproj calls per int; cheap but slightly redundant.
  • _validate_crs_arg raises ValueError for bool and TypeError for other non-int/non-str. Mixed types are intentional (semantic vs. type), but a reader scanning the test file might miss why. Could add a one-line comment to that effect.

What looks good

  • Bool rejection ordering: isinstance(crs, bool) check runs before isinstance(crs, int), which is the correct order since bool is an int subclass.
  • Pyproj is treated as optional — the EPSG-resolves check is skipped when pyproj is missing, matching the rest of the module.
  • XRSPATIAL_GEOTIFF_STRICT=1 re-raise path is preserved through the new validator.
  • Test file covers the validator directly plus integration through to_geotiff and the .vrt dispatch.

Checklist

  • Algorithm matches reference/paper — not applicable (validation logic)
  • All implemented backends produce consistent results — same validator runs in eager / GPU / VRT writers
  • NaN handling — not applicable
  • Edge cases covered by tests — numpy-int case is not covered
  • Dask chunk boundaries — not applicable
  • No premature materialisation
  • No new benchmark needed
  • Docstrings present and accurate
  • data.attrs['crs'] path covered — see blocker above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: to_geotiff(crs=True) writes EPSG=1 silently; bool slips through int check

1 participant