Skip to content

[python] Support time travel by timestamp and watermark in TimeTravelUtil#7899

Merged
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:pypaimon-time-travel-by-timestamp
May 23, 2026
Merged

[python] Support time travel by timestamp and watermark in TimeTravelUtil#7899
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:pypaimon-time-travel-by-timestamp

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Purpose

Currently pypaimon's TimeTravelUtil only supports time travel by scan.tag-name and scan.snapshot-id. This PR adds support for three additional time travel modes that are available on the Java side:

  • scan.timestamp-millis: Travel to the latest snapshot with commit time <= the given timestamp (milliseconds)
  • scan.timestamp: Same as above but accepts a human-readable timestamp string (e.g. '2023-12-01 12:00:00')
  • scan.watermark: Travel to the first snapshot with watermark >= the given value

@JunRuiLee JunRuiLee force-pushed the pypaimon-time-travel-by-timestamp branch from df56ffc to c149409 Compare May 19, 2026 10:54
@JunRuiLee JunRuiLee marked this pull request as ready for review May 19, 2026 10:55
@JunRuiLee JunRuiLee force-pushed the pypaimon-time-travel-by-timestamp branch from c149409 to 1eac751 Compare May 20, 2026 07:50
Add scan.timestamp-millis, scan.timestamp, and scan.watermark options
to CoreOptions and extend TimeTravelUtil to resolve snapshots using
these parameters, aligning with Java-side time travel capabilities.

- Add SCAN_TIMESTAMP_MILLIS, SCAN_TIMESTAMP, SCAN_WATERMARK config
  options and accessor methods to CoreOptions
- Extend TimeTravelUtil.try_travel_to_snapshot() to handle timestamp
  and watermark based time travel
- Add SnapshotManager.later_or_equal_watermark() for watermark lookup
- Parse scan.timestamp using local timezone (matching Java behavior)
- Unify all point-in-time scan options in TableScan through
  TimeTravelUtil, ensuring mutual-exclusion validation and consistent
  schema/data resolution
- Reject conflicting incremental-between-timestamp and point-in-time
  scan options
@JunRuiLee JunRuiLee force-pushed the pypaimon-time-travel-by-timestamp branch from 1eac751 to a76c0ba Compare May 20, 2026 10:05
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [python] Support time travel by timestamp and watermark in TimeTravelUtil

Overall well-structured. The refactoring of table_scan.py to delegate all time-travel logic to TimeTravelUtil is a clean separation of concerns.

Correctness

  1. later_or_equal_watermark binary search relies on monotonically non-decreasing watermarks. If a snapshot has watermark = None, the code does earliest = mid + 1 (treats it as "too small"). This is correct under Paimon's guarantee, but a brief comment noting the invariant assumption would help future readers.

  2. Gap handling degrades binary search to O(n) in worst case. When get_snapshot_by_id(mid) returns None, the linear scan from mid+1 to latest could be expensive if many consecutive snapshot IDs have been expired. Consistent with existing earlier_or_equal_time_mills but worth noting.

  3. Dead-code defensiveness in table_scan.py. The check "if snapshot is None: raise ValueError(...)" after try_travel_to_snapshot is unreachable when has_time_travel is True, because all paths inside either return a valid snapshot or raise. A comment would clarify intent.

Minor / Style

  1. Error message formatting -- "point-in-time scan options: %s" % SCAN_KEYS renders the list with Python repr syntax. Consider ", ".join(SCAN_KEYS) for cleaner output.

  2. No direct unit test for binary search edge cases -- The tests use _StubSnapshotManager with simple linear implementation. Consider a focused test for later_or_equal_watermark with gaps in snapshot IDs and None watermarks.

Design

  1. Good alignment with Java side. The _parse_timestamp_to_millis using naive datetime is consistent with Java's TimeZone.getDefault() semantics.

Nice work overall -- test cases are thorough and code is readable.

@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 1c38526 into apache:master May 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants