Skip to content

UI automation: multi-window inspect, cross-window search, screenshot compositing#419

Merged
nmetulev merged 23 commits into
mainfrom
nm/ui-fixes
Apr 15, 2026
Merged

UI automation: multi-window inspect, cross-window search, screenshot compositing#419
nmetulev merged 23 commits into
mainfrom
nm/ui-fixes

Conversation

@nmetulev
Copy link
Copy Markdown
Member

@nmetulev nmetulev commented Apr 9, 2026

Summary

Improvements to winapp ui commands based on agent trial feedback (50+ sessions). Focuses on multi-window handling, output clarity, JSON completeness, and smarter element resolution.

Changes

Multi-window inspect output

  • Window headers with HWND, title, type, class name, and owner for each window
  • Deduplicates windows already in the main UIA tree (modal dialogs)
  • Filters internal system windows (PseudoConsoleWindow, IME)
  • Footer with element count + Use -w <HWND> hint
  • Default depth changed from 5 to 4

Interactive mode (--interactive)

  • Ancestor breadcrumbs: non-interactive containers shown as collapsed grey lines (e.g., Window > Pane > MenuBar)
  • Preserves window separators through all filters

Cross-window element search

  • FindSingleElementAsync falls back to popup/owned windows when element not found on main window
  • Works for invoke, click, set-value, get-value, focus, scroll, wait-for
  • SourceWindowHandleWindowHandle on UiElement for correct HWND routing (exposed in JSON)
  • Per-window COMException handling for resilience

Screenshot compositing

  • Multi-window screenshots compose into a single PNG with label bars
  • Cross-process dialog detection via GetWindow(GW_OWNER)

get-value / wait-for smart fallback

  • Fallback chain: TextPattern → ValuePattern → TogglePattern → SelectionPattern → Name
  • Works for RichEditBox, TextBox, ToggleSwitch, ComboBox, RadioButton, TabView, labels
  • wait-for --value works standalone without --property using same fallback

Invokable disambiguation

  • When text search matches multiple elements (e.g., SettingsExpander), automatically picks the only invokable element
  • Applied in both main-window and cross-window search paths

JSON output completeness

  • All commands now produce pure JSON in --json mode (no stdout pollution)
  • Added JSON results for set-value, focus, scroll-into-view (previously had no output)
  • Added hwnd to invoke/click/scroll/screenshot results
  • WindowInfo enriched with label, size, owner, className, isForeground
  • Separator elements filtered from JSON inspect output

AutomationId promotion safety

  • PromoteUniqueAutomationIds skips cross-window elements (frequency map only covers main window)

Other

  • Removed verbose multi-window warning (silent auto-select)
  • Truncates long names (80 chars) and values (60 chars) for WebView base64 data URIs

nmetulev and others added 6 commits April 8, 2026 23:30
WebView2 controls expose base64 data URIs as element names, which
bloat inspect output with hundreds of characters per element.

Truncate displayed names to 80 chars and values to 60 chars with
'…' suffix. JSON output is unaffected (full data preserved).

Applied to both inspect and search text output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All UI commands now transparently find and interact with elements
across popup windows, flyout menus, and cross-process dialogs.

Part 1 - FindSingleElementAsync fallback:
When element not found on main window, automatically searches all
app-related windows (same PID + cross-process owned via GW_OWNER).
Covers flyout MenuBar items, file picker dialogs, system dialogs.
SourceWindowHandle tracked on UiElement for correct HWND routing.

Part 2 - inspect spans all app windows:
Full tree inspect shows popup/owned window contents with separator:
  --- HWND 1840448: "View" (popup, Xaml_WindowedPopupClass) ---
    mnu-splitview-5211 MenuItem "Split View"

Part 3 - ResolveComElement uses element source HWND:
When an element came from a popup/dialog, action methods
(invoke, click, set-value, etc.) resolve against the correct
window HWND instead of the session's main window.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple windows are detected (dialogs, popups), compose all
captures side-by-side into a single PNG image instead of separate
files. Each window gets a label bar showing HWND, type, and title.

Better for agents: one image to analyze instead of multiple files.
Dark background, 8px gap between windows, 28px label bars.

Uses SkiaSharp canvas compositing (already a dependency for PNG
encoding). Single-window behavior unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With cross-window element search, all commands transparently find
elements across popups/dialogs. The verbose multi-window warning
is no longer needed — inspect shows all windows inline, and
action commands resolve elements across windows automatically.

Auto-selection still happens (foreground → largest), just silently.
Logged at debug level for troubleshooting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add header separator for main window when multiple windows exist
- Show owner HWND in popup/dialog separator lines (e.g., owner: HWND 133306)
- Add blank line before footer, show 'Use -w <HWND>' hint for multi-window
- Preserve window separator elements through --interactive/--hide-* filters
- Change default inspect depth from 5 to 4 (--interactive still bumps to 8)
- Deduplicate windows already in main UIA tree (modal dialogs)
- Filter internal system windows (PseudoConsoleWindow, IME, MSCTFIME UI)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When --interactive filters to only interactive elements, non-interactive
parent containers (Window, Pane, Group, etc.) are now shown as collapsed
grey breadcrumb lines like '… Window > Pane > MenuBar' to preserve
tree context. Breadcrumbs only appear when the ancestor path changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 9, 2026 08:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the winapp ui command set to better handle multi-window applications by improving inspect output, enabling cross-window element resolution for interactions, and compositing multi-window screenshots into a single image.

Changes:

  • Extend UIA inspect/search to detect and traverse additional popup/owned windows and route interactions via a new per-element SourceWindowHandle.
  • Update ui screenshot multi-window behavior to composite captures side-by-side into one PNG (and adjust JSON output accordingly).
  • Improve CLI output clarity (window separators, interactive breadcrumbs) and adjust defaults (inspect depth, output truncation).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/winapp-CLI/WinApp.Cli/Services/UiSessionService.cs Removes verbose multi-window console warning; exposes window class name helper for cross-window filtering.
src/winapp-CLI/WinApp.Cli/Services/UiAutomationService.cs Adds multi-window inspect traversal, cross-window fallback element search, and source-HWND-based element resolution.
src/winapp-CLI/WinApp.Cli/Models/UiElement.cs Introduces SourceWindowHandle for correct cross-window interaction routing.
src/winapp-CLI/WinApp.Cli/Commands/UiSearchCommand.cs Truncates long element names/values in search output to reduce noise.
src/winapp-CLI/WinApp.Cli/Commands/UiScreenshotCommand.cs Captures multiple windows and composites into a single PNG; changes JSON output shape/contents.
src/winapp-CLI/WinApp.Cli/Commands/UiInspectCommand.cs Preserves window separators through filters and adds breadcrumb context rendering in --interactive mode; truncates long fields.
src/winapp-CLI/WinApp.Cli/Commands/SharedUiOptions.cs Changes default inspect depth from 5 to 4.
docs/fragments/skills/winapp-cli/ui-automation.md Updates inspect depth example wording.
.github/plugin/skills/winapp-cli/ui-automation/SKILL.md Updates documented default depth to 4 and adjusts inspect example wording.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ansiConsole.WriteLine();
ansiConsole.MarkupLine($"[grey]--- {EscapeMarkup(el.Name ?? "")} ---[/]");
lastBreadcrumb = "";
Array.Clear(ancestorTypes);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.Clear(ancestorTypes) does not compile (there is no overload that takes only the array). Use Array.Clear(ancestorTypes, 0, ancestorTypes.Length) (or ancestorTypes.AsSpan().Clear()) when resetting breadcrumb state.

Suggested change
Array.Clear(ancestorTypes);
Array.Clear(ancestorTypes, 0, ancestorTypes.Length);

Copilot uses AI. Check for mistakes.
Comment thread src/winapp-CLI/WinApp.Cli/Services/UiSessionService.cs
Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiScreenshotCommand.cs
Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiScreenshotCommand.cs
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

Build Metrics Report

Binary Sizes

Artifact Baseline Current Delta
CLI (ARM64) 30.42 MB 30.49 MB 📈 +73.0 KB (+0.23%)
CLI (x64) 30.79 MB 30.87 MB 📈 +74.0 KB (+0.23%)
MSIX (ARM64) 12.84 MB 12.86 MB 📈 +20.6 KB (+0.16%)
MSIX (x64) 13.63 MB 13.68 MB 📈 +50.9 KB (+0.36%)
NPM Package 26.70 MB 26.77 MB 📈 +64.1 KB (+0.23%)
NuGet Package 26.79 MB 26.85 MB 📈 +60.8 KB (+0.22%)

Test Results

718 passed out of 718 tests in 366.7s (+20.5s vs. baseline)

Test Coverage

20.4% line coverage, 34.7% branch coverage · ⚠️ -0.2% vs. baseline

CLI Startup Time

43ms median (x64, winapp --version) · ✅ +5ms vs. baseline


Updated 2026-04-11 03:15:27 UTC · commit 7244b64 · workflow run

nmetulev and others added 10 commits April 9, 2026 09:14
- Fix Array.Clear to use 3-arg overload for compat
- Update AutoSelectWindow doc to say 'silently' (warning was removed)
- Set composite Width/Height in multi-window screenshot JSON output

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
logger.LogInformation writes to stdout (not stderr) in this CLI,
so all human-readable output in UI commands was polluting --json
responses. Wrap all LogInformation and ansiConsole output in
if (!json) guards so --json mode returns pure JSON only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Expose WindowHandle on UiElement (was JsonIgnore)
- Add WindowInfo metadata: label, size, owner, className, isForeground
- Add Hwnd to invoke/click/scroll/screenshot JSON results
- Add JSON result types for set-value, focus, scroll-into-view
- Populate all new fields in command handlers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, wait-for --value required --property to specify which UIA
property to check. Now --value alone uses the same smart fallback chain
as get-value (TextPattern → ValuePattern → Name), making it work as a
standalone assertion for any control type.

  # Before (required --property):
  wait-for CounterDisplay -a app --property Name --value "5" -t 5000

  # Now (smart fallback, works for TextBlock, TextBox, etc.):
  wait-for CounterDisplay -a app --value "5" -t 5000

  # Still works with --property for specific UIA properties:
  wait-for DarkMode -a app --property ToggleState --value "On" -t 5000

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two improvements from agent feedback:

1. SelectionPattern fallback in get-value/wait-for:
   The TextPattern → ValuePattern → Name chain didn't cover ComboBoxes,
   RadioButtons, or TabViews which expose their selection via
   SelectionPattern.GetSelection(). Added as step 3 (before Name
   fallback). Now 'get-value CmbTheme' returns 'Dark' and
   'wait-for CmbTheme --value Dark' works naturally.

2. Invokable disambiguation for text search:
   When multiple elements match the same text (e.g., SettingsExpander
   where Group, Button, and Text all share Name='Advanced'), prefer the
   only invokable element instead of throwing an ambiguity error. This
   handles the common case of 'invoke Advanced' targeting the Button.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- get-value docs: add SelectionPattern to fallback chain description,
  add ComboBox example
- wait-for docs: update fallback chain to include SelectionPattern
- Plain text search: document invokable disambiguation behavior
- Updated: ui-automation.md, SKILL.md, skill fragment, npm-usage.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. TogglePattern fallback in get-value/wait-for:
   ToggleSwitches and CheckBoxes expose state via TogglePattern, not
   ValuePattern. Added as step 3 in the fallback chain:
   TextPattern → ValuePattern → TogglePattern → SelectionPattern → Name
   Now 'get-value ToggleLogging' returns 'On'/'Off' and
   'wait-for ToggleLogging --value Off' works.

2. Cross-window invokable disambiguation:
   FindElementOnOtherWindows only accepted exact-1 matches from popup
   windows. When a ComboBox dropdown has ListItem 'Dark' + Text 'Dark',
   the search found 2 matches and returned null. Added the same
   invokable disambiguation logic (prefer only invokable element) to
   the cross-window search path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Filter separator elements from JSON inspect output
   Type='---' synthetic separators were leaking into --json responses.
   JSON consumers now get only real elements (each has windowHandle).

2. wait-for: use FindSingleElementAsync for cross-window support
   Previously used SearchAsync for non-slug selectors, which only
   searched the main window. Now uses FindSingleElementAsync uniformly,
   which has the cross-window popup/dialog fallback.

3. PromoteUniqueAutomationIds: skip cross-window elements
   The frequency map only covers the main window tree. Elements from
   popup windows could be falsely promoted as unique. Now skips
   promotion for elements with WindowHandle != session.WindowHandle.

4. Cross-window search: add per-window COMException handling
   FindElementOnOtherWindows now catches COMException per window,
   so a popup closing mid-search doesn't abort the entire loop.

5. SKTypeface resource leak in screenshot compositing
   SKTypeface.FromFamilyName returns a native handle not released by
   SKPaint.Dispose(). Added using statement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nmetulev nmetulev force-pushed the nm/ui-fixes branch 2 times, most recently from ebc642f to bcaa342 Compare April 11, 2026 00:18
nmetulev and others added 2 commits April 10, 2026 17:39
Bug 1: search misses elements on popup/owned windows
SearchAsync only searched the main window root. Elements like StatusBar
and FindTextBox that live on separate WinUI 3 popup HWNDs returned 0
results. Now falls back to searching popup/owned windows (via
GetAllAppWindows) when main window returns no matches. Sets
WindowHandle on each result for correct HWND attribution.

Bug 2: wait-for --value is exact match only
Status bars with values like 'Ln 1, Col 1 | 42 words' couldn't be
asserted with --value 'words'. Added --contains flag for substring
matching:
  wait-for StatusBar --value 'words' --contains -a app -t 5000

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UIA FindAll(TreeScope_Descendants) can stall in the WebView2 Chromium
UIA provider subtree, causing sibling elements after the WebView to be
silently skipped. This caused 'search StatusBar' and 'get-value
StatusBar' to return 0 results even though 'inspect' (which uses
TreeWalker) found the element.

Added ManualTreeSearch fallback that uses TreeWalker
(GetFirstChildElement/GetNextSiblingElement) to walk the tree
sibling-by-sibling. This is the same traversal method inspect uses.
The fallback activates when FindAll returns 0 results.

Applied to both SearchAsync and FindSingleElementAsync.

Root cause verified with repro app: StatusBar Edit at depth 3 was a
sibling after PreviewWebView Pane. FindAll found elements before the
WebView but not after. Manual tree walk found it reliably.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nmetulev and others added 2 commits April 10, 2026 19:29
Fix version back to 0.2.2 — local Debug builds report 1.0.0 (assembly
default) instead of the version.json value. The generate-llm-docs
script reads version from CLI output, so Debug builds produce wrong
version. Schema content is correct, only version was wrong.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The → (U+2192) arrow character in the --value option description was
getting corrupted through PowerShell pipeline serialization (→ became
→). Replaced with ASCII '->' to avoid cross-pipeline encoding issues.
Regenerated cli-schema.json from updated CLI binary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@chiaramooney chiaramooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Out of curiosity why the tree depth default change?

@nmetulev nmetulev merged commit d43f42f into main Apr 15, 2026
12 checks passed
@nmetulev nmetulev deleted the nm/ui-fixes branch April 15, 2026 16:59
@nmetulev
Copy link
Copy Markdown
Member Author

Looks good! Out of curiosity why the tree depth default change?

i changed it because i added another layer on top of the tree for the window, so it was changed tom maintain the same depth essentially

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants