Skip to content

feat(collection-browse): tabular collection browse view behind feature flag#1780

Merged
orhanrauf merged 7 commits into
mainfrom
poc/collection-browse
May 13, 2026
Merged

feat(collection-browse): tabular collection browse view behind feature flag#1780
orhanrauf merged 7 commits into
mainfrom
poc/collection-browse

Conversation

@orhanrauf
Copy link
Copy Markdown
Member

@orhanrauf orhanrauf commented May 11, 2026

Summary

  • Adds an unranked, paginated tabular browse view of a collection alongside the existing search experience, gated on a new COLLECTION_BROWSE org feature flag.
  • Backend reuses VespaVectorDB.filter_search() + count() and forces chunk_index = 0 so each source entity shows up as exactly one row. New BrowseService is exposed through the DI container and wired into /collections/{readable_id}/search/browse.
  • Frontend ships a BrowseTable component (rows, sync/entity-type filters, debounced substring name search, offset/limit pagination, row drawer, CSV/JSON export) wired into CollectionDetailView as a Tabs sibling to Search.
  • Name search uses a dedicated regex path on the vector-db protocol (name matches "(?i).*<escaped>.*") rather than contains, because Vespa's contains operator on attribute fields is whole-token match — typing "Qui" wasn't matching "Quick process to debug" via the existing FilterCondition translator.

CI status (verified locally on changed lines)

  • ruff check + ruff-format: 100% clean (diff-quality)
  • mypy: 100% clean on changed lines (diff-quality) — full-repo baseline is unchanged
  • import-linter: 0 broken
  • backend unit tests: 4201 passed (added FakeBrowseService + threaded it through the test_container fixture so existing tests don't break on the new required Container.browse_service field)
  • ESLint: clean
  • Frontend production build: clean

Test plan

  • Enable the COLLECTION_BROWSE feature flag on an org and open a collection — verify the Browse tab appears next to Search.
  • Verify pagination, sync filter, entity-type filter all work independently and in combination.
  • Type into the name search (e.g. partial token like "Qui") — confirm reactive results, that special characters (., +, () don't break the query, and that aborts work when typing fast.
  • Export current page → CSV and JSON, then export "all matching" → confirm capped at 1000 rows.
  • Click a row, confirm the drawer opens and entity_id copy-to-clipboard works.
  • Disable the flag → Browse tab disappears and /search/browse returns 403/feature-disabled.

🤖 Generated with Claude Code


Summary by cubic

Adds a feature-flagged tabular browse view for collections with a new POST /collections/{readable_id}/search/browse endpoint and a Browse tab in the UI. It lists entities unranked with pagination, filters, case-insensitive substring name search (properly escaped), and CSV/JSON export.

  • New Features

    • Backend: POST /collections/{readable_id}/search/browse gated by FeatureFlag.COLLECTION_BROWSE; uses VectorDBProtocol.filter_search() + count() in parallel with chunk_index = 0; supports name_query via a regex matches clause in VespaVectorDB (case-insensitive substring with escaping); adds BrowseService, DI wiring, and BrowseRequest/BrowseResponse.
    • Frontend: BrowseTable with source filter, debounced name search, offset/limit pagination, row drawer, open-in-source link, and CSV/JSON export (cap 1000 for “all matching”); added Tabs in CollectionDetailView; new FeatureFlags.COLLECTION_BROWSE.
  • Bug Fixes

    • Moved route to /search/browse for consistency with other tiers.
    • Escaped double quotes in the YQL name-substring regex to prevent parse errors and injection; added tests for the clause, endpoint flag gate (404 when disabled), and service behavior.
    • Input and request bounds: sync_ids/entity_types capped at 100; name_query requires ≥2 chars (frontend hints and only sends when long enough); export fetches in 200-row chunks to respect the backend page limit.

Written for commit bcbf327. Summary will update on new commits.

orhanrauf and others added 4 commits May 7, 2026 09:56
…owse flag

Adds an unranked, paginated listing of a collection alongside the existing
search experience. Backend reuses VespaVectorDB.filter_search() + count(),
forces chunk_index = 0 so each source entity shows up as one row, and gates
the endpoint on the new COLLECTION_BROWSE org feature flag. Frontend adds a
BrowseTable component (rows + source filter + offset/limit pagination + row
drawer) wired into CollectionDetailView as a Tabs sibling to Search,
gated on the same flag via the existing organization store hasFeature helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Debounced (250 ms) name-contains search bar with abort-on-change so a fast
  typist doesn't backlog requests. Sends a `name contains <q>` FilterGroup
  to the existing /browse endpoint; no backend change required.
- Export menu: CSV or JSON, scoped to current page or all matching (capped
  at 1000 rows to stay within the offset/limit window). All-matching mode
  re-issues a single /browse call with the active filters.
- Visual polish: tighter toolbar layout with right-aligned counts + export,
  bg-card on the table, sticky drawer header, copy-to-clipboard on entity_id,
  tabular-nums for counts, fixed column widths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: typing "Qui" returned 0 results when "Quick process to debug" was in the
collection. Root cause: Vespa's `contains` operator on attribute fields is a
token (whole-word) match, not a substring match — and `name` is indexed as
`attribute | summary` only. So `name contains 'Qui'` wanted the literal
token "Qui", not a substring.

Fix: add a dedicated `name_substring` path on the vector-db protocol that
emits `name matches "(?i).*<escaped>.*"` instead of going through the
shared FilterCondition translator. Special regex chars are escaped via
`re.escape` so a query like "1.5+" doesn't blow up the engine. The browse
request schema gets a corresponding `name_query` field, and the frontend
sends `{ name_query }` instead of building a (broken) `filter` group.

`contains` semantics elsewhere in the system (search filters, agentic
navigate tools) are unchanged — only the browse path uses the new clause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ixture

The new BrowseService field on Container caused all tests using the
test_container fixture to error with "missing 1 required positional
argument: 'browse_service'". Adds a FakeBrowseService matching the
existing instant/classic/agentic fake pattern and threads it through
conftest.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/airweave/domains/search/adapters/vector_db/vespa_client.py">

<violation number="1" location="backend/airweave/domains/search/adapters/vector_db/vespa_client.py:316">
P1: Escape double quotes in `name_substring` before interpolating into the double-quoted YQL regex literal.</violation>
</file>

<file name="backend/airweave/api/v1/endpoints/search.py">

<violation number="1" location="backend/airweave/api/v1/endpoints/search.py:86">
P1: The browse endpoint path is missing the `/search` segment, so it won’t be reachable at the documented `/.../search/browse` URL.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread backend/airweave/domains/search/adapters/vector_db/vespa_client.py Outdated
Comment thread backend/airweave/api/v1/endpoints/search.py Outdated
…tring YQL

CI's diff-cover (80% threshold) failed at 52% on this branch. Adds unit
tests for the three new code paths:

- BrowseService: happy path, 404 on missing collection, chunk_index=0
  anchor, sync_ids/entity_types convenience filters, user-filter
  combination semantics, name_query trim-to-None behavior.
- browse_collection endpoint: feature-flag gate (returns 404, skips
  usage check + service), happy path through usage check + service.
- VespaVectorDB._build_name_substring_clause: regex/YQL char escaping,
  plus presence/absence of the `matches` clause in count() and
  filter_search() YQL.

Brings diff coverage to 98% (only the prod-only `BrowseService(...)`
constructor call in factory.py is still uncovered, which would require
running `create_container` end-to-end).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e prefix

Two issues identified by cubic on PR #1780:

1. Move endpoint from `/{readable_id}/browse` to `/{readable_id}/search/browse`
   for consistency with sibling tiers (instant/classic/agentic, plus the
   admin `as-user` variants — all live under `/search/<tier>`). Frontend
   call site updated to match.

2. Escape double quotes in `_build_name_substring_clause`. The YQL string
   literal is double-quoted, and `re.escape` does not touch `"` (it's not a
   regex metacharacter), so a name_query like `Quick "test"` produced a
   broken YQL parse (and a potential YQL-injection vector). Switched the
   second-pass escape from single quote to double quote and added a test
   that the resulting clause has exactly two unescaped `"` (the literal
   delimiters).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread frontend/src/components/collection/BrowseTable.tsx
Comment thread backend/airweave/domains/search/browse/service.py Outdated
Comment thread backend/airweave/domains/search/fakes/browse.py Outdated
Comment thread backend/airweave/schemas/search_v2.py
Comment thread backend/airweave/schemas/search_v2.py
…vel imports, export chunking

- BrowseRequest: cap sync_ids/entity_types at 100, require name_query min length 2 to avoid full-scan triggers
- Move BrowseResponse import to module level in browse service and fake
- Frontend: gate name search on >=2 chars with hint, loop export in 200-row chunks to respect backend limit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@orhanrauf orhanrauf merged commit 6b55261 into main May 13, 2026
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants