UI automation: multi-window inspect, cross-window search, screenshot compositing#419
Conversation
WebView2 controls expose base64 data URIs as element names, which bloat inspect output with hundreds of characters per element. Truncate displayed names to 80 chars and values to 60 chars with '…' suffix. JSON output is unaffected (full data preserved). Applied to both inspect and search text output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All UI commands now transparently find and interact with elements
across popup windows, flyout menus, and cross-process dialogs.
Part 1 - FindSingleElementAsync fallback:
When element not found on main window, automatically searches all
app-related windows (same PID + cross-process owned via GW_OWNER).
Covers flyout MenuBar items, file picker dialogs, system dialogs.
SourceWindowHandle tracked on UiElement for correct HWND routing.
Part 2 - inspect spans all app windows:
Full tree inspect shows popup/owned window contents with separator:
--- HWND 1840448: "View" (popup, Xaml_WindowedPopupClass) ---
mnu-splitview-5211 MenuItem "Split View"
Part 3 - ResolveComElement uses element source HWND:
When an element came from a popup/dialog, action methods
(invoke, click, set-value, etc.) resolve against the correct
window HWND instead of the session's main window.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple windows are detected (dialogs, popups), compose all captures side-by-side into a single PNG image instead of separate files. Each window gets a label bar showing HWND, type, and title. Better for agents: one image to analyze instead of multiple files. Dark background, 8px gap between windows, 28px label bars. Uses SkiaSharp canvas compositing (already a dependency for PNG encoding). Single-window behavior unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With cross-window element search, all commands transparently find elements across popups/dialogs. The verbose multi-window warning is no longer needed — inspect shows all windows inline, and action commands resolve elements across windows automatically. Auto-selection still happens (foreground → largest), just silently. Logged at debug level for troubleshooting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add header separator for main window when multiple windows exist - Show owner HWND in popup/dialog separator lines (e.g., owner: HWND 133306) - Add blank line before footer, show 'Use -w <HWND>' hint for multi-window - Preserve window separator elements through --interactive/--hide-* filters - Change default inspect depth from 5 to 4 (--interactive still bumps to 8) - Deduplicate windows already in main UIA tree (modal dialogs) - Filter internal system windows (PseudoConsoleWindow, IME, MSCTFIME UI) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When --interactive filters to only interactive elements, non-interactive parent containers (Window, Pane, Group, etc.) are now shown as collapsed grey breadcrumb lines like '… Window > Pane > MenuBar' to preserve tree context. Breadcrumbs only appear when the ancestor path changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances the winapp ui command set to better handle multi-window applications by improving inspect output, enabling cross-window element resolution for interactions, and compositing multi-window screenshots into a single image.
Changes:
- Extend UIA inspect/search to detect and traverse additional popup/owned windows and route interactions via a new per-element
SourceWindowHandle. - Update
ui screenshotmulti-window behavior to composite captures side-by-side into one PNG (and adjust JSON output accordingly). - Improve CLI output clarity (window separators, interactive breadcrumbs) and adjust defaults (inspect depth, output truncation).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/winapp-CLI/WinApp.Cli/Services/UiSessionService.cs | Removes verbose multi-window console warning; exposes window class name helper for cross-window filtering. |
| src/winapp-CLI/WinApp.Cli/Services/UiAutomationService.cs | Adds multi-window inspect traversal, cross-window fallback element search, and source-HWND-based element resolution. |
| src/winapp-CLI/WinApp.Cli/Models/UiElement.cs | Introduces SourceWindowHandle for correct cross-window interaction routing. |
| src/winapp-CLI/WinApp.Cli/Commands/UiSearchCommand.cs | Truncates long element names/values in search output to reduce noise. |
| src/winapp-CLI/WinApp.Cli/Commands/UiScreenshotCommand.cs | Captures multiple windows and composites into a single PNG; changes JSON output shape/contents. |
| src/winapp-CLI/WinApp.Cli/Commands/UiInspectCommand.cs | Preserves window separators through filters and adds breadcrumb context rendering in --interactive mode; truncates long fields. |
| src/winapp-CLI/WinApp.Cli/Commands/SharedUiOptions.cs | Changes default inspect depth from 5 to 4. |
| docs/fragments/skills/winapp-cli/ui-automation.md | Updates inspect depth example wording. |
| .github/plugin/skills/winapp-cli/ui-automation/SKILL.md | Updates documented default depth to 4 and adjusts inspect example wording. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ansiConsole.WriteLine(); | ||
| ansiConsole.MarkupLine($"[grey]--- {EscapeMarkup(el.Name ?? "")} ---[/]"); | ||
| lastBreadcrumb = ""; | ||
| Array.Clear(ancestorTypes); |
There was a problem hiding this comment.
Array.Clear(ancestorTypes) does not compile (there is no overload that takes only the array). Use Array.Clear(ancestorTypes, 0, ancestorTypes.Length) (or ancestorTypes.AsSpan().Clear()) when resetting breadcrumb state.
| Array.Clear(ancestorTypes); | |
| Array.Clear(ancestorTypes, 0, ancestorTypes.Length); |
Build Metrics ReportBinary Sizes
Test Results✅ 718 passed out of 718 tests in 366.7s (+20.5s vs. baseline) Test Coverage❌ 20.4% line coverage, 34.7% branch coverage · CLI Startup Time43ms median (x64, Updated 2026-04-11 03:15:27 UTC · commit |
- Fix Array.Clear to use 3-arg overload for compat - Update AutoSelectWindow doc to say 'silently' (warning was removed) - Set composite Width/Height in multi-window screenshot JSON output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
logger.LogInformation writes to stdout (not stderr) in this CLI, so all human-readable output in UI commands was polluting --json responses. Wrap all LogInformation and ansiConsole output in if (!json) guards so --json mode returns pure JSON only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Expose WindowHandle on UiElement (was JsonIgnore) - Add WindowInfo metadata: label, size, owner, className, isForeground - Add Hwnd to invoke/click/scroll/screenshot JSON results - Add JSON result types for set-value, focus, scroll-into-view - Populate all new fields in command handlers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, wait-for --value required --property to specify which UIA property to check. Now --value alone uses the same smart fallback chain as get-value (TextPattern → ValuePattern → Name), making it work as a standalone assertion for any control type. # Before (required --property): wait-for CounterDisplay -a app --property Name --value "5" -t 5000 # Now (smart fallback, works for TextBlock, TextBox, etc.): wait-for CounterDisplay -a app --value "5" -t 5000 # Still works with --property for specific UIA properties: wait-for DarkMode -a app --property ToggleState --value "On" -t 5000 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two improvements from agent feedback: 1. SelectionPattern fallback in get-value/wait-for: The TextPattern → ValuePattern → Name chain didn't cover ComboBoxes, RadioButtons, or TabViews which expose their selection via SelectionPattern.GetSelection(). Added as step 3 (before Name fallback). Now 'get-value CmbTheme' returns 'Dark' and 'wait-for CmbTheme --value Dark' works naturally. 2. Invokable disambiguation for text search: When multiple elements match the same text (e.g., SettingsExpander where Group, Button, and Text all share Name='Advanced'), prefer the only invokable element instead of throwing an ambiguity error. This handles the common case of 'invoke Advanced' targeting the Button. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- get-value docs: add SelectionPattern to fallback chain description, add ComboBox example - wait-for docs: update fallback chain to include SelectionPattern - Plain text search: document invokable disambiguation behavior - Updated: ui-automation.md, SKILL.md, skill fragment, npm-usage.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. TogglePattern fallback in get-value/wait-for: ToggleSwitches and CheckBoxes expose state via TogglePattern, not ValuePattern. Added as step 3 in the fallback chain: TextPattern → ValuePattern → TogglePattern → SelectionPattern → Name Now 'get-value ToggleLogging' returns 'On'/'Off' and 'wait-for ToggleLogging --value Off' works. 2. Cross-window invokable disambiguation: FindElementOnOtherWindows only accepted exact-1 matches from popup windows. When a ComboBox dropdown has ListItem 'Dark' + Text 'Dark', the search found 2 matches and returned null. Added the same invokable disambiguation logic (prefer only invokable element) to the cross-window search path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Filter separator elements from JSON inspect output Type='---' synthetic separators were leaking into --json responses. JSON consumers now get only real elements (each has windowHandle). 2. wait-for: use FindSingleElementAsync for cross-window support Previously used SearchAsync for non-slug selectors, which only searched the main window. Now uses FindSingleElementAsync uniformly, which has the cross-window popup/dialog fallback. 3. PromoteUniqueAutomationIds: skip cross-window elements The frequency map only covers the main window tree. Elements from popup windows could be falsely promoted as unique. Now skips promotion for elements with WindowHandle != session.WindowHandle. 4. Cross-window search: add per-window COMException handling FindElementOnOtherWindows now catches COMException per window, so a popup closing mid-search doesn't abort the entire loop. 5. SKTypeface resource leak in screenshot compositing SKTypeface.FromFamilyName returns a native handle not released by SKPaint.Dispose(). Added using statement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ebc642f to
bcaa342
Compare
Bug 1: search misses elements on popup/owned windows SearchAsync only searched the main window root. Elements like StatusBar and FindTextBox that live on separate WinUI 3 popup HWNDs returned 0 results. Now falls back to searching popup/owned windows (via GetAllAppWindows) when main window returns no matches. Sets WindowHandle on each result for correct HWND attribution. Bug 2: wait-for --value is exact match only Status bars with values like 'Ln 1, Col 1 | 42 words' couldn't be asserted with --value 'words'. Added --contains flag for substring matching: wait-for StatusBar --value 'words' --contains -a app -t 5000 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UIA FindAll(TreeScope_Descendants) can stall in the WebView2 Chromium UIA provider subtree, causing sibling elements after the WebView to be silently skipped. This caused 'search StatusBar' and 'get-value StatusBar' to return 0 results even though 'inspect' (which uses TreeWalker) found the element. Added ManualTreeSearch fallback that uses TreeWalker (GetFirstChildElement/GetNextSiblingElement) to walk the tree sibling-by-sibling. This is the same traversal method inspect uses. The fallback activates when FindAll returns 0 results. Applied to both SearchAsync and FindSingleElementAsync. Root cause verified with repro app: StatusBar Edit at depth 3 was a sibling after PreviewWebView Pane. FindAll found elements before the WebView but not after. Manual tree walk found it reliably. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix version back to 0.2.2 — local Debug builds report 1.0.0 (assembly default) instead of the version.json value. The generate-llm-docs script reads version from CLI output, so Debug builds produce wrong version. Schema content is correct, only version was wrong. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The → (U+2192) arrow character in the --value option description was getting corrupted through PowerShell pipeline serialization (→ became ΓåÆ). Replaced with ASCII '->' to avoid cross-pipeline encoding issues. Regenerated cli-schema.json from updated CLI binary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chiaramooney
left a comment
There was a problem hiding this comment.
Looks good! Out of curiosity why the tree depth default change?
i changed it because i added another layer on top of the tree for the window, so it was changed tom maintain the same depth essentially |
Summary
Improvements to
winapp uicommands based on agent trial feedback (50+ sessions). Focuses on multi-window handling, output clarity, JSON completeness, and smarter element resolution.Changes
Multi-window inspect output
Use -w <HWND>hintInteractive mode (
--interactive)Window > Pane > MenuBar)Cross-window element search
FindSingleElementAsyncfalls back to popup/owned windows when element not found on main windowSourceWindowHandle→WindowHandleonUiElementfor correct HWND routing (exposed in JSON)Screenshot compositing
GetWindow(GW_OWNER)get-value / wait-for smart fallback
wait-for --valueworks standalone without--propertyusing same fallbackInvokable disambiguation
JSON output completeness
--jsonmode (no stdout pollution)hwndto invoke/click/scroll/screenshot resultsWindowInfoenriched with label, size, owner, className, isForegroundAutomationId promotion safety
PromoteUniqueAutomationIdsskips cross-window elements (frequency map only covers main window)Other