Skip to content

feat(code-index): add Semble as a local on-the-fly embedding provider#399

Open
navedmerchant wants to merge 12 commits into
mainfrom
semble
Open

feat(code-index): add Semble as a local on-the-fly embedding provider#399
navedmerchant wants to merge 12 commits into
mainfrom
semble

Conversation

@navedmerchant
Copy link
Copy Markdown
Contributor

@navedmerchant navedmerchant commented May 29, 2026

Related GitHub Issue

Closes: #200

Description

Semble is a local code search provider that requires no API keys or external services. When a user selects "Semble - Local" as their code index provider, the extension handles everything automatically:

  1. Downloads the correct platform-specific archive from sembleexec v0.3.1 into VS Code's global storage
  2. Extracts the fast-start one-dir build (tar.gz on unix, zip on Windows) for ~20x faster startup compared to single-file binaries
  3. Verifies the binary is functional, then marks the provider as ready

Searches are performed on-the-fly — semble indexes as it searches, so there's no separate indexing step or background process.

Supported platforms: linux-x64, linux-arm64, darwin-arm64, win32-x64. The semble option only appears in the UI on these platforms.

Architecture:

  • semble-downloader.ts — Downloads archive from GitHub releases, extracts via system tar/powershell, caches in globalStorageUri/semble/
  • semble-cli.ts — Thin wrapper that spawns the binary with array args (no shell) for safe argument passing
  • provider.ts — Orchestrates download → verify → search. Implements the same searchIndex interface as other code index providers

The binary is downloaded once and reused across sessions. No user configuration needed.

Test Procedure

  • 56 unit tests across 3 test files:
    • semble-downloader.spec.ts — Platform detection, download/skip logic, redirect handling, extraction, cleanup on failure
    • semble-cli.spec.ts — CLI spawn behavior, argument construction, JSON output parsing
    • provider.spec.ts — Download flow, platform check, error states, search result conversion
  • Run: pnpm --filter zoo-code exec vitest run services/code-index/semble/
  • Manual: Select "Semble - Local" in code index settings on a supported platform → verify binary downloads and search returns results

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue.
  • Scope: Changes are focused on the semble integration.
  • Self-Review: Done.
  • Testing: 56 unit tests covering downloader, CLI, and provider.
  • Documentation Impact: N/A — semble is self-explanatory from the UI.
  • Contribution Guidelines: Read and agreed.

Screenshots / Videos

Screenshot 2026-05-30 at 12 21 46 PM

Documentation Updates

  • No documentation updates are required.

Additional Notes

  • Intel Mac (darwin-x64) is not supported as the tree sitter dependency does not support it.
  • The binary persists in global storage and is not deleted on disable for fast re-enablement.

Get in Touch

navedmerchant

Summary by CodeRabbit

  • New Features

    • Added Semble as a local code-indexing provider; OpenRouter added to embedder options.
  • Improvements

    • On-device code search with lazy setup; UI shows Semble only on supported platforms and hides remote vector-store fields when selected.
    • Robust local installer: download, checksum, extraction, and platform-aware behavior.
    • Provider options for result count/content; Semble considered configured when enabled.
    • Added localized Semble labels across many languages.
  • Tests

    • Expanded tests for Semble provider, CLI, downloader, and config handling.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Integrates Semble as a local CLI-backed code index provider: adds types/schemas, SembleCLI wrapper, downloader with checksum and extraction, SembleProvider implementation, CodeIndexManager/service-factory routing, UI/platform detection and i18n, embedding model profile, and extensive Vitest tests.

Changes

Semble local indexing support

Layer / File(s) Summary
Type contracts and configuration schemas
packages/types/src/embedding.ts, src/services/code-index/interfaces/manager.ts, packages/types/src/codebase-index.ts, packages/types/src/vscode-extension-host.ts
Adds semble to embedder provider unions, extends ExtensionState with platform/arch, and adds semble models entry to codebase index models schema.
Semble type contracts and interfaces
src/services/code-index/semble/types.ts
Adds SembleContentType, SembleChunk, SembleSearchResult, SembleCheckResult, SembleConfig, ISembleProvider, and SEMBLE_DEFAULTS.
SembleCLI wrapper for command execution
src/services/code-index/semble/semble-cli.ts
Implements SembleCLI: checkInstalled, search, findRelated, _spawn (spawn with shell:false), and _parseOutput handling v0.3+ nested chunks and legacy flat arrays.
Semble binary download and extraction
src/services/code-index/semble/semble-downloader.ts
Adds platform mappings, per-platform SHA-256 checks, verifyChecksum, downloadSemble with redirect/timeout, extraction helpers (tar/PowerShell/unzip), and getSembleBinaryPath.
SembleProvider lifecycle and search
src/services/code-index/semble/provider.ts, src/services/code-index/semble/index.ts
Adds SembleProvider: initialization (download+validate), lazy startIndexing, searchIndex delegating to CLI and converting results, clearIndexData, dispose, and result normalization (semble-N ids, path normalization, line defaults).
Configuration manager, manager routing, and service guards
src/services/code-index/config-manager.ts, src/services/code-index/manager.ts, src/services/code-index/service-factory.ts, src/shared/embeddingModels.ts
Config manager maps and treats semble as configured; CodeIndexManager delegates lifecycle/search/clear to _sembleProvider when active and conditionally creates it; CodeIndexServiceFactory throws for createEmbedder/createVectorStore when provider is semble; embedding models add potion-code-16M and default mapping.
Webview state, UI, and localization
src/core/webview/ClineProvider.ts, webview-ui/src/components/chat/CodeIndexPopover.tsx, webview-ui/src/i18n/locales/*/settings.json
Adds platform/arch to webview state; UI detects supported platforms and conditionally shows semble option and hides Qdrant fields; adds codeIndex.sembleProvider localization keys across locales.
Tests: config, factory, CLI, downloader, provider
src/services/code-index/__tests__/*, src/services/code-index/semble/__tests__/*
Extensive Vitest coverage: config-manager, service-factory guards, SembleCLI parsing/errors, SembleDownloader platform/version/download/extract, and SembleProvider lifecycle/search/result conversion tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • taltas
  • hannesrudolph
  • edelauna
  • JamesRobert20

Poem

🐰 I dug a burrow in the code,
Semble seeds to lighten the load.
CLI hums and checksums sing,
Local search hops in to bring
fast, neat results — a joyful node.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: adding Semble as a local embedding provider with clear scope and primary change.
Linked Issues check ✅ Passed The PR successfully implements all primary objectives from #200: Semble integration as a built-in option, automatic binary download per platform, no API keys/external services required, on-the-fly indexing, and UI checkbox for easy enablement.
Out of Scope Changes check ✅ Passed All file changes directly support Semble integration: types/schemas extended with Semble provider, CLI wrapper and downloader modules added, provider implementation added, UI updated with Semble option and localization, existing service factory updated to reject Semble, and 56 unit tests added—no unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed PR description includes all required template sections: issue linked (#200), implementation details with architecture overview, comprehensive test procedure, and completed pre-submission checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch semble

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@navedmerchant
Copy link
Copy Markdown
Contributor Author

Not ready for review yet, the code is being reviewed and evolved

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/services/code-index/interfaces/manager.ts (1)

78-87: ⚡ Quick win

Consider consolidating the duplicated EmbedderProvider type definition.

EmbedderProvider is defined both here and in packages/types/src/embedding.ts. Different parts of the codebase import from different locations (e.g., config-manager.ts imports from this file, while service-factory.ts imports from @roo-code/types). While currently in sync, maintaining two sources of truth creates risk of drift and inconsistency.

♻️ Suggested consolidation approach

Consider removing the local type definition and importing from the shared package instead:

+import { EmbedderProvider } from "`@roo-code/types`"
 import { VectorStoreSearchResult } from "./vector-store"
 import * as vscode from "vscode"
 
 /**
  * Interface for the code index manager
  */
 export interface ICodeIndexManager {
   // ... rest of interface
 
-export type IndexingState = "Standby" | "Indexing" | "Indexed" | "Error" | "Stopping"
-export type EmbedderProvider =
-	| "openai"
-	| "ollama"
-	| "openai-compatible"
-	| "gemini"
-	| "mistral"
-	| "vercel-ai-gateway"
-	| "bedrock"
-	| "openrouter"
-	| "semble"

This ensures a single source of truth and prevents accidental divergence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/code-index/interfaces/manager.ts` around lines 78 - 87,
Duplicate EmbedderProvider type exists locally and in the shared package; remove
the local type definition (EmbedderProvider) from this file and replace
usages/imports (e.g., in config-manager.ts) to import the type from the
canonical package (`@roo-code/types` or packages/types/src/embedding.ts), update
any export or re-export as needed so service-factory.ts and other consumers use
the single source of truth, run TypeScript build/typecheck to ensure no
unresolved imports remain, and delete the now-unused local declaration to avoid
drift.
webview-ui/src/components/chat/CodeIndexPopover.tsx (1)

774-776: ⚡ Quick win

Consider adding test coverage for semble provider UI.

Per coding guidelines for webview-ui/src/**/*.{ts,tsx}, prefer local webview-ui tests for React/webview behavior. Consider adding tests under webview-ui/src/**/__tests__ to cover:

  • Selecting semble from the provider dropdown
  • Qdrant URL/API key fields hidden when semble is selected
  • Validation passes with only codebaseIndexEnabled for semble
  • Semble path input field renders and updates state correctly
  • Installation instructions panel renders

As per coding guidelines: "Prefer local webview-ui tests for React/webview behavior such as component rendering, local state, hooks, form dirty-state, validation, or prop wiring."

Also applies to: 1446-1482, 1484-1537

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/components/chat/CodeIndexPopover.tsx` around lines 774 - 776,
Add local webview-ui tests for the CodeIndexPopover component to cover the
semble provider UI: create tests under webview-ui/src/**/__tests__ that render
CodeIndexPopover, select the SelectItem with value "semble", assert that Qdrant
URL/API key fields are hidden, confirm validation passes when only
codebaseIndexEnabled is set for semble, verify the semble path input field
appears and updates component state on change, and assert the installation
instructions panel is rendered; use the existing test utilities (render,
fireEvent/userEvent, assertions) consistent with other webview-ui tests to
locate and interact with the provider dropdown and semble-specific inputs.
webview-ui/src/i18n/locales/en/settings.json (1)

204-204: 💤 Low value

Consider adding a Windows path example.

The semblePathDescription provides a helpful Unix path example ("/usr/local/bin/semble"), but users on Windows might benefit from seeing a Windows path example as well, such as "C:\\Python310\\Scripts\\semble.exe".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/i18n/locales/en/settings.json` at line 204, The translation
entry semblePathDescription currently shows only a Unix example; update its text
to include a Windows path example (e.g., "C:\\Python310\\Scripts\\semble.exe")
so Windows users have guidance—locate the semblePathDescription key in the
locales/en/settings.json and add a short Windows example alongside the existing
Unix example, keeping escaping consistent for backslashes and quotes.
src/services/code-index/semble/semble-cli.ts (1)

226-256: ⚡ Quick win

Consider adding runtime validation for CLI output structure.

Lines 242 and 247 use type assertions (as SembleSearchResult[]) without runtime validation. If the semble CLI output format changes or contains unexpected data, this could cause runtime errors downstream when code expects the SembleSearchResult shape.

Consider adding a simple validation helper or at least checking for required fields before casting.

🛡️ Example validation approach
 		// Handle successful response: {query, results: [{chunk, score}]}
 		if (parsed.results && Array.isArray(parsed.results)) {
-			return parsed.results as SembleSearchResult[]
+			// Basic validation: check that results have expected shape
+			return parsed.results.filter((r: any) => 
+				r?.chunk?.file_path && 
+				r?.chunk?.content && 
+				typeof r?.score === 'number'
+			) as SembleSearchResult[]
 		}
 
 		// Fallback: if it's a flat array (older format)
 		if (Array.isArray(parsed)) {
-			return parsed as SembleSearchResult[]
+			return parsed.filter((r: any) => 
+				r?.chunk?.file_path && 
+				r?.chunk?.content && 
+				typeof r?.score === 'number'
+			) as SembleSearchResult[]
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/code-index/semble/semble-cli.ts` around lines 226 - 256, The
_parseOutput method casts parsed.results and parsed arrays to
SembleSearchResult[] without runtime checks; add a lightweight validator (e.g.,
validateSembleResult) and use it to filter/transform parsed.results and parsed
before returning to ensure each item has required properties (e.g., chunk is
string, score is number, and any other SembleSearchResult fields); replace the
direct casts in _parseOutput with Array.isArray checks plus
items.filter(validateSembleResult) (or map-to-safe-shape) so only well-formed
SembleSearchResult objects are returned and malformed entries are dropped or
logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/code-index/semble/semble-cli.ts`:
- Around line 150-157: The spawn call in _spawn uses an unsupported maxBuffer
option and silences the TS error with an as any cast; remove the maxBuffer
property from the options passed to spawn(this.semblePath, args, { shell: false,
timeout: options.timeout, stdio: [...] }) and drop the as any cast, and if you
need to limit output implement manual buffering and size checks inside
child.stdout?.on("data", ...) and child.stderr?.on("data", ...) (e.g.,
accumulate into stdout/stderr strings, enforce a MAX_BUFFER threshold, kill the
child and reject if exceeded) while keeping the rest of _spawn (child handling,
resolve/reject) intact.

---

Nitpick comments:
In `@src/services/code-index/interfaces/manager.ts`:
- Around line 78-87: Duplicate EmbedderProvider type exists locally and in the
shared package; remove the local type definition (EmbedderProvider) from this
file and replace usages/imports (e.g., in config-manager.ts) to import the type
from the canonical package (`@roo-code/types` or packages/types/src/embedding.ts),
update any export or re-export as needed so service-factory.ts and other
consumers use the single source of truth, run TypeScript build/typecheck to
ensure no unresolved imports remain, and delete the now-unused local declaration
to avoid drift.

In `@src/services/code-index/semble/semble-cli.ts`:
- Around line 226-256: The _parseOutput method casts parsed.results and parsed
arrays to SembleSearchResult[] without runtime checks; add a lightweight
validator (e.g., validateSembleResult) and use it to filter/transform
parsed.results and parsed before returning to ensure each item has required
properties (e.g., chunk is string, score is number, and any other
SembleSearchResult fields); replace the direct casts in _parseOutput with
Array.isArray checks plus items.filter(validateSembleResult) (or
map-to-safe-shape) so only well-formed SembleSearchResult objects are returned
and malformed entries are dropped or logged.

In `@webview-ui/src/components/chat/CodeIndexPopover.tsx`:
- Around line 774-776: Add local webview-ui tests for the CodeIndexPopover
component to cover the semble provider UI: create tests under
webview-ui/src/**/__tests__ that render CodeIndexPopover, select the SelectItem
with value "semble", assert that Qdrant URL/API key fields are hidden, confirm
validation passes when only codebaseIndexEnabled is set for semble, verify the
semble path input field appears and updates component state on change, and
assert the installation instructions panel is rendered; use the existing test
utilities (render, fireEvent/userEvent, assertions) consistent with other
webview-ui tests to locate and interact with the provider dropdown and
semble-specific inputs.

In `@webview-ui/src/i18n/locales/en/settings.json`:
- Line 204: The translation entry semblePathDescription currently shows only a
Unix example; update its text to include a Windows path example (e.g.,
"C:\\Python310\\Scripts\\semble.exe") so Windows users have guidance—locate the
semblePathDescription key in the locales/en/settings.json and add a short
Windows example alongside the existing Unix example, keeping escaping consistent
for backslashes and quotes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 332e2e21-894e-4a2d-9e0f-464ce90a3d05

📥 Commits

Reviewing files that changed from the base of the PR and between 45088a7 and 147f1ea.

📒 Files selected for processing (19)
  • packages/types/src/codebase-index.ts
  • packages/types/src/embedding.ts
  • packages/types/src/vscode-extension-host.ts
  • src/core/webview/ClineProvider.ts
  • src/core/webview/webviewMessageHandler.ts
  • src/services/code-index/config-manager.ts
  • src/services/code-index/interfaces/config.ts
  • src/services/code-index/interfaces/manager.ts
  • src/services/code-index/manager.ts
  • src/services/code-index/semble/__tests__/provider.spec.ts
  • src/services/code-index/semble/__tests__/semble-cli.spec.ts
  • src/services/code-index/semble/index.ts
  • src/services/code-index/semble/provider.ts
  • src/services/code-index/semble/semble-cli.ts
  • src/services/code-index/semble/types.ts
  • src/services/code-index/service-factory.ts
  • src/shared/embeddingModels.ts
  • webview-ui/src/components/chat/CodeIndexPopover.tsx
  • webview-ui/src/i18n/locales/en/settings.json

Comment thread src/services/code-index/semble/semble-cli.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

@thany
Copy link
Copy Markdown

thany commented May 29, 2026

So users have to install python, hope to their diety that they did it right and pip works, and then install semble like that. So that still "a lot" of local fiddling. It's still a fragile solution.

The only thing this removes, is the requirement for a text embedding model, and a Qdrant instance.

It doesn't remove fragility.

It's not specifically in #200, but iyam it would be in the spirit of the requested feature to make code indexing a simple checkbox. Just tick it, and done. No local installs of anything. Ticking the checkbox does all the black magic that needs to happen.

@navedmerchant
Copy link
Copy Markdown
Contributor Author

navedmerchant commented May 29, 2026

So users have to install python, hope to their diety that they did it right and pip works, and then install semble like that. So that still "a lot" of local fiddling. It's still a fragile solution.

The only thing this removes, is the requirement for a text embedding model, and a Qdrant instance.

It doesn't remove fragility.

It's not specifically in #200, but iyam it would be in the spirit of the requested feature to make code indexing a simple checkbox. Just tick it, and done. No local installs of anything. Ticking the checkbox does all the black magic that needs to happen.

I had the same vision when trying to implement this (and even had a version implemented) but decided against it for two reasons

  1. It added too much code and complexity to zoo code
  2. It would basically create another semble implementation inside zoo, which is okay but it would prevent it from getting future improvements from development in semble.

In my testing this is a pretty simple implementation which makes it easier for users to enable codebase indexing. It works quite well and adds value without adding too much complexity inside zoo.

Setting up semble with pip is a lot simpler than setting up qdrant and gives pretty good indexing performance

But you have a valid point, we should make codebase indexing as seamless as possible, and it should be “built in” we will revisit this in the future.

@navedmerchant navedmerchant marked this pull request as draft May 30, 2026 14:46
@navedmerchant navedmerchant self-assigned this May 30, 2026
@taltas
Copy link
Copy Markdown
Contributor

taltas commented May 30, 2026

It would be cool if we could add a companion PR to the read me in the semble repo instructions to install zoo code https://github.com/MinishLab/semble

@navedmerchant
Copy link
Copy Markdown
Contributor Author

@thany I took your advice and now Zoo Code will automatically download a prebuilt semble from https://github.com/navedmerchant/sembleexec for the current platform, making it seamless! all you need to do is tick the box and it should "just work"

@taltas that would not be necessary anymore, it should work out of the box for all users.

@navedmerchant navedmerchant marked this pull request as ready for review May 31, 2026 04:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/code-index/semble/provider.ts (1)

137-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Search should initialize Semble on first use.

searchIndex() currently returns [] until initialize()/startIndexing() has already run. That contradicts the on-the-fly contract and can make the first codebase search silently fail. Initialize here and only bail if startup leaves the provider in "Error".

♻️ Minimal fix
 	async searchIndex(query: string, directoryPrefix?: string): Promise<VectorStoreSearchResult[]> {
 		if (!this._isInitialized) {
-			console.warn("[SembleProvider] searchIndex called before initialization")
-			return []
+			await this.initialize()
 		}
 
-		if (this._state === "Error") {
+		if (!this._isInitialized || this._state === "Error") {
 			return []
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/code-index/semble/provider.ts` around lines 137 - 145,
searchIndex currently bails out when !_isInitialized causing first-use searches
to silently fail; modify searchIndex in SembleProvider to call and await
this.initialize() (or this.startIndexing() if that is the intended boot path)
when !_isInitialized, then proceed with the search flow and only return [] if
after initialization this._state === "Error". Locate the searchIndex method and
replace the early return on !_isInitialized with an awaited initialization
attempt, preserve the existing check for this._state === "Error" after
initialization, and ensure any initialization errors are handled the same way
the rest of the provider handles startup failures.
🧹 Nitpick comments (2)
webview-ui/src/components/chat/CodeIndexPopover.tsx (2)

179-183: ⚡ Quick win

Add local webview-ui tests for Semble-specific behavior.

This change adds provider-gated rendering and provider-specific validation rules; please add/update Vitest coverage in webview-ui/src/**/__tests__ for: supported vs unsupported platform option visibility, Semble validation path, and Qdrant fields hidden for Semble.

As per coding guidelines, "webview-ui/src/**/*.{ts,tsx}: Prefer local webview-ui tests for React/webview behavior ... Add or update Vitest coverage under webview-ui/src/**/__tests__ instead of reaching for apps/vscode-e2e."

Also applies to: 206-209, 774-778, 1449-1450

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/components/chat/CodeIndexPopover.tsx` around lines 179 - 183,
The PR added Semble-specific UI and validation in the CodeIndexPopover component
but lacks local Vitest coverage; add/update tests under
webview-ui/src/**/__tests__ that mount CodeIndexPopover (or the relevant
exported component) and assert: (1) provider-gated rendering shows supported
platform options and hides unsupported ones, (2) when provider is "semble" the
form uses the Semble validation path (e.g., emits/validates only
codebaseIndexEnabled) and rejects missing required fields for other providers,
and (3) Qdrant-related fields/inputs are not rendered when provider === "semble"
(cover the other provider case where Qdrant fields are visible). Ensure tests
use React testing library utilities already used in the repo and target the
component names/functions from CodeIndexPopover to keep tests local to
webview-ui.

206-209: ⚡ Quick win

Avoid duplicating Semble platform support matrices in UI.

SEMBLE_SUPPORTED_PLATFORMS is hardcoded here, which can drift from the backend support source and hide/show the provider incorrectly over time. Prefer consuming a single source of truth from extension state (or a shared constant) instead of duplicating the list in the component.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/components/chat/CodeIndexPopover.tsx` around lines 206 - 209,
The hardcoded SEMBLE_SUPPORTED_PLATFORMS array in CodeIndexPopover.tsx (and the
derived isSembleSupported check) duplicates backend/extension state and may
drift; replace the literal array with a single source of truth by importing the
supported-platforms constant or reading the Semble support flag from the
extension state/feature-flags exposed to the UI (e.g., use the shared exported
constant or a selector from the extension state used elsewhere), then compute
isSembleSupported using that imported value along with platform and arch so the
UI always reflects the canonical support matrix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/code-index/semble/__tests__/semble-downloader.spec.ts`:
- Around line 27-41: The test currently reuses shared EventEmitter instances
(mockRequest and mockResponse) across test cases which can leak listeners;
update the https.get mock implementation used in the test (the vi.mock block) so
it creates new EventEmitter instances inside the get callback each invocation
(i.e., instantiate mockRequest and mockResponse within the vi.fn for "get") or
alternatively add a beforeEach that calls mockRequest.removeAllListeners() and
mockResponse.removeAllListeners() and reinitializes any spy functions; target
the mock definitions referenced as mockRequest, mockResponse and the mocked
"https".get to ensure fresh emitters per test.

In `@src/services/code-index/semble/semble-downloader.ts`:
- Around line 226-234: The Windows branch that spawns PowerShell embeds
archivePath and destDir inside single-quoted literals for Expand-Archive (in the
spawn call where process.platform === "win32"), which breaks when paths contain
apostrophes; to fix, escape any single quotes in archivePath and destDir
(replace ' with '' per PowerShell rules) before building the command string or
avoid single-quoted literals by using & { } with properly escaped arguments, and
update the spawn invocation that calls "powershell" with the corrected,
safely-escaped path variables so Expand-Archive receives valid paths.

---

Outside diff comments:
In `@src/services/code-index/semble/provider.ts`:
- Around line 137-145: searchIndex currently bails out when !_isInitialized
causing first-use searches to silently fail; modify searchIndex in
SembleProvider to call and await this.initialize() (or this.startIndexing() if
that is the intended boot path) when !_isInitialized, then proceed with the
search flow and only return [] if after initialization this._state === "Error".
Locate the searchIndex method and replace the early return on !_isInitialized
with an awaited initialization attempt, preserve the existing check for
this._state === "Error" after initialization, and ensure any initialization
errors are handled the same way the rest of the provider handles startup
failures.

---

Nitpick comments:
In `@webview-ui/src/components/chat/CodeIndexPopover.tsx`:
- Around line 179-183: The PR added Semble-specific UI and validation in the
CodeIndexPopover component but lacks local Vitest coverage; add/update tests
under webview-ui/src/**/__tests__ that mount CodeIndexPopover (or the relevant
exported component) and assert: (1) provider-gated rendering shows supported
platform options and hides unsupported ones, (2) when provider is "semble" the
form uses the Semble validation path (e.g., emits/validates only
codebaseIndexEnabled) and rejects missing required fields for other providers,
and (3) Qdrant-related fields/inputs are not rendered when provider === "semble"
(cover the other provider case where Qdrant fields are visible). Ensure tests
use React testing library utilities already used in the repo and target the
component names/functions from CodeIndexPopover to keep tests local to
webview-ui.
- Around line 206-209: The hardcoded SEMBLE_SUPPORTED_PLATFORMS array in
CodeIndexPopover.tsx (and the derived isSembleSupported check) duplicates
backend/extension state and may drift; replace the literal array with a single
source of truth by importing the supported-platforms constant or reading the
Semble support flag from the extension state/feature-flags exposed to the UI
(e.g., use the shared exported constant or a selector from the extension state
used elsewhere), then compute isSembleSupported using that imported value along
with platform and arch so the UI always reflects the canonical support matrix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 22c81bb0-3664-464b-a867-3c12e1e8caf7

📥 Commits

Reviewing files that changed from the base of the PR and between 54317b8 and d580bf3.

📒 Files selected for processing (34)
  • packages/types/src/codebase-index.ts
  • packages/types/src/vscode-extension-host.ts
  • src/core/webview/ClineProvider.ts
  • src/services/code-index/__tests__/config-manager.spec.ts
  • src/services/code-index/__tests__/service-factory.spec.ts
  • src/services/code-index/config-manager.ts
  • src/services/code-index/manager.ts
  • src/services/code-index/semble/__tests__/provider.spec.ts
  • src/services/code-index/semble/__tests__/semble-cli.spec.ts
  • src/services/code-index/semble/__tests__/semble-downloader.spec.ts
  • src/services/code-index/semble/index.ts
  • src/services/code-index/semble/provider.ts
  • src/services/code-index/semble/semble-cli.ts
  • src/services/code-index/semble/semble-downloader.ts
  • src/services/code-index/semble/types.ts
  • webview-ui/src/components/chat/CodeIndexPopover.tsx
  • webview-ui/src/i18n/locales/ca/settings.json
  • webview-ui/src/i18n/locales/de/settings.json
  • webview-ui/src/i18n/locales/en/settings.json
  • webview-ui/src/i18n/locales/es/settings.json
  • webview-ui/src/i18n/locales/fr/settings.json
  • webview-ui/src/i18n/locales/hi/settings.json
  • webview-ui/src/i18n/locales/id/settings.json
  • webview-ui/src/i18n/locales/it/settings.json
  • webview-ui/src/i18n/locales/ja/settings.json
  • webview-ui/src/i18n/locales/ko/settings.json
  • webview-ui/src/i18n/locales/nl/settings.json
  • webview-ui/src/i18n/locales/pl/settings.json
  • webview-ui/src/i18n/locales/pt-BR/settings.json
  • webview-ui/src/i18n/locales/ru/settings.json
  • webview-ui/src/i18n/locales/tr/settings.json
  • webview-ui/src/i18n/locales/vi/settings.json
  • webview-ui/src/i18n/locales/zh-CN/settings.json
  • webview-ui/src/i18n/locales/zh-TW/settings.json
💤 Files with no reviewable changes (2)
  • packages/types/src/codebase-index.ts
  • src/services/code-index/config-manager.ts
✅ Files skipped from review due to trivial changes (9)
  • webview-ui/src/i18n/locales/pl/settings.json
  • webview-ui/src/i18n/locales/en/settings.json
  • webview-ui/src/i18n/locales/nl/settings.json
  • webview-ui/src/i18n/locales/ja/settings.json
  • webview-ui/src/i18n/locales/it/settings.json
  • webview-ui/src/i18n/locales/de/settings.json
  • webview-ui/src/i18n/locales/tr/settings.json
  • webview-ui/src/i18n/locales/zh-TW/settings.json
  • webview-ui/src/i18n/locales/id/settings.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/services/code-index/semble/index.ts
  • src/services/code-index/manager.ts

Comment thread src/services/code-index/semble/__tests__/semble-downloader.spec.ts Outdated
Comment thread src/services/code-index/semble/semble-downloader.ts
Comment thread src/services/code-index/semble/semble-downloader.ts
@navedmerchant
Copy link
Copy Markdown
Contributor Author

navedmerchant commented May 31, 2026

Verified this PR work on MacOS arm, windows x64 and linux x64. In my testing it makes zoo find relevant context and answer questions about 20-30percent faster

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/code-index/semble/semble-downloader.ts`:
- Around line 167-171: The code currently skips checksum verification when
SEMBLE_SHA256 lacks an entry for the current platform, allowing extraction to
proceed without integrity checks; update the block that computes
platformKey/expectedChecksum so that if expectedChecksum is missing (falsy) you
immediately abort with a hard error (throw an Error or call process.exit(1)) and
include identifying info (platformKey and SEMBLE_VERSION) in the message; keep
the subsequent await verifyChecksum(archivePath, expectedChecksum) unchanged so
verified paths still run as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 549c7ede-f2e7-4f8b-ab0e-e9d0ebdb1efd

📥 Commits

Reviewing files that changed from the base of the PR and between d580bf3 and 24d607a.

📒 Files selected for processing (2)
  • src/services/code-index/semble/__tests__/semble-downloader.spec.ts
  • src/services/code-index/semble/semble-downloader.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/services/code-index/semble/tests/semble-downloader.spec.ts

Comment on lines +167 to +171
const platformKey = `${process.platform}-${process.arch}`
const expectedChecksum = SEMBLE_SHA256[platformKey]
if (expectedChecksum) {
await verifyChecksum(archivePath, expectedChecksum)
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Abort when no checksum is configured.

This currently fails open: if a future SEMBLE_VERSION or supported platform is added without updating SEMBLE_SHA256, the archive is still extracted and later executed without integrity verification. Missing checksum data should be a hard error here.

🔒 Minimal fix
 		// Verify archive integrity before extraction
 		const platformKey = `${process.platform}-${process.arch}`
 		const expectedChecksum = SEMBLE_SHA256[platformKey]
-		if (expectedChecksum) {
-			await verifyChecksum(archivePath, expectedChecksum)
-		}
+		if (!expectedChecksum) {
+			throw new Error(`No checksum configured for ${platformKey} at ${SEMBLE_VERSION}`)
+		}
+		await verifyChecksum(archivePath, expectedChecksum)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const platformKey = `${process.platform}-${process.arch}`
const expectedChecksum = SEMBLE_SHA256[platformKey]
if (expectedChecksum) {
await verifyChecksum(archivePath, expectedChecksum)
}
const platformKey = `${process.platform}-${process.arch}`
const expectedChecksum = SEMBLE_SHA256[platformKey]
if (!expectedChecksum) {
throw new Error(`No checksum configured for ${platformKey} at ${SEMBLE_VERSION}`)
}
await verifyChecksum(archivePath, expectedChecksum)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/code-index/semble/semble-downloader.ts` around lines 167 - 171,
The code currently skips checksum verification when SEMBLE_SHA256 lacks an entry
for the current platform, allowing extraction to proceed without integrity
checks; update the block that computes platformKey/expectedChecksum so that if
expectedChecksum is missing (falsy) you immediately abort with a hard error
(throw an Error or call process.exit(1)) and include identifying info
(platformKey and SEMBLE_VERSION) in the message; keep the subsequent await
verifyChecksum(archivePath, expectedChecksum) unchanged so verified paths still
run as before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be problem as we will not show semble on an unsupported platform

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navedmerchant — That's fair for the current state: all four entries in SEMBLE_ARCHIVES are also present in SEMBLE_SHA256, so the UI gate already ensures the checksum lookup never misses today.

The risk is purely forward-looking: the two maps are maintained independently, so a future contributor who adds a new platform to SEMBLE_ARCHIVES (making it UI-visible and reachable) might forget to add its entry to SEMBLE_SHA256, causing silent skip of verification. A cheap way to close that gap without changing runtime behavior is a compile-time constraint that forces the two maps to stay in sync:

// Ensures every SEMBLE_ARCHIVES key has a corresponding checksum entry.
// TypeScript will error at compile time if a platform is added to one map but not the other.
type _SembleChecksumParity = {
  [K in keyof typeof SEMBLE_ARCHIVES]: (typeof SEMBLE_SHA256)[K]
}

Alternatively, the hard-error approach from the original suggestion makes the mismatch a visible runtime failure rather than a silent bypass, which is a reasonable safety net. Either way is fine — happy to defer to your preference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @coderabbitai on this - lets just hard-error, rather then conditionally verify.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, @edelauna. The hard-error path is the safer default — a silent skip is easy to miss during review and leaves the door open for an unverified binary to be executed.

@navedmerchant — given the reviewer alignment here, the minimal change would be:

-		if (expectedChecksum) {
-			await verifyChecksum(archivePath, expectedChecksum)
-		}
+		if (!expectedChecksum) {
+			throw new Error(`No checksum configured for platform ${platformKey} at ${SEMBLE_VERSION}`)
+		}
+		await verifyChecksum(archivePath, expectedChecksum)

This keeps the happy path identical (all four current platforms have checksums) while making any future omission an immediate, visible failure rather than a silent bypass.

@edelauna
Copy link
Copy Markdown
Contributor

edelauna commented May 31, 2026

This PR will set the precedent for allowing the extension to download, configure, install, and run third party executables. I know we haven’t set any governance principles but my preference would be for users to input a PATH location for where to find the CLI if they want to configure semble.

@edelauna
Copy link
Copy Markdown
Contributor

This PR will set the precedent for allowing the extension to download, configure, install, and run third party executables. I know we haven’t set any governance principles but my preference would be for users to input a PATH location for where to find the CLI if they want to configure semble.

I guess this isn’t automatic, the user still needs to interact with zoo code to get it installed. I’ll review a bit more.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/code-index/semble/semble-downloader.ts (1)

149-157: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Upgrade deletes old binary before verifying new one.

If the download or extraction fails after deleting the old installation, the user loses their working Semble binary. Consider extracting to a temporary location first, then atomically replacing the old installation only after the new one is verified.

🔧 Suggested approach
 	// Version mismatch — remove old installation before downloading new one
 	if (installedVersion && installedVersion !== SEMBLE_VERSION) {
 		console.log(`[SembleDownloader] Version changed from ${installedVersion} to ${SEMBLE_VERSION}, updating...`)
-		try {
-			await fs.rm(extractDir, { recursive: true, force: true })
-		} catch {
-			// ignore cleanup errors
-		}
+		// Keep old installation until new one is verified - see extractDir usage below
 	}

Then modify the extraction flow to use a staging directory (e.g., semble-new) and only rename to semble after successful verification, removing the old directory at that point.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/code-index/semble/semble-downloader.ts` around lines 149 - 157,
The current upgrade flow deletes the existing installation (using
installedVersion, SEMBLE_VERSION and fs.rm on extractDir) before
downloading/extracting the new binary; change this to extract into a staging
directory (e.g., extractDir + "-new" or "semble-new"), run verification there,
and only after successful validation atomically replace the old install
(rename/move staging to extractDir) and then remove the old directory; update
the logic around the version check in the code that uses installedVersion,
SEMBLE_VERSION and extractDir so that fs.rm is only used to remove the old
installation after the new staged install is verified (or to cleanup failed
staging), and ensure error paths leave the original extractDir intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/services/code-index/semble/semble-downloader.ts`:
- Around line 149-157: The current upgrade flow deletes the existing
installation (using installedVersion, SEMBLE_VERSION and fs.rm on extractDir)
before downloading/extracting the new binary; change this to extract into a
staging directory (e.g., extractDir + "-new" or "semble-new"), run verification
there, and only after successful validation atomically replace the old install
(rename/move staging to extractDir) and then remove the old directory; update
the logic around the version check in the code that uses installedVersion,
SEMBLE_VERSION and extractDir so that fs.rm is only used to remove the old
installation after the new staged install is verified (or to cleanup failed
staging), and ensure error paths leave the original extractDir intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1a4fad23-6a88-432c-a30e-65a760f90d61

📥 Commits

Reviewing files that changed from the base of the PR and between 24d607a and 62845a1.

📒 Files selected for processing (1)
  • src/services/code-index/semble/semble-downloader.ts

Copy link
Copy Markdown
Contributor

@edelauna edelauna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - has some suggestions, and requesting changes to switch from conditional checksum verification to hard fail on unsupported architectures.

Comment on lines +48 to +50
if (provider === "semble") {
throw new Error(
"Semble provider handles its own embedding. Do not call createEmbedder() for semble — use SembleProvider instead.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is semble really an EmbedderProvider? It doesn't generate embedding vectors — it's a self-contained search binary. Forcing it into this type means the factory has to throw at runtime to prevent misuse, which feels like the type system telling us the abstraction is wrong. Would a discriminated union at the manager level (branching before the factory is involved) be cleaner here?

* Initializes the provider: downloads semble, then validates it works.
*/
async initialize(): Promise<void> {
if (this._isInitialized) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if initialize() is called concurrently — e.g., manager reconfigures while a prior init is still downloading? Both callers pass this guard simultaneously, and two concurrent downloads race to the same archive path. An _initPromise that subsequent callers await (or a CancellationTokenSource to cancel the in-flight init) would be the idiomatic VS Code pattern here.

Comment on lines +110 to +114
let stdout = ""
let stderr = ""

child.stdout?.on("data", (data: Buffer) => {
stdout += data.toString()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accumulates stdout without any limit. If the binary outputs large verbose output (or is the wrong version), this could exhaust the extension host heap well before the 120s timeout fires — and since the extension host is shared across all extensions, it'd take everything down. Worth capping at some max bytes?

Comment on lines +168 to +176
if (directoryPrefix) {
const normalizedPrefix = path.join(this.workspacePath, directoryPrefix).replace(/\\/g, "/")
converted = converted.filter((r) => {
const filePath = (r.payload?.filePath ?? "").replace(/\\/g, "/")
return filePath.startsWith(normalizedPrefix + "/") || filePath === normalizedPrefix
})
console.log(
`[SembleProvider] Filtered to "${directoryPrefix}": ${converted.length} of ${results.length} results`,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every other provider, directoryPrefix scopes the DB query before topK results are fetched, so topK counts within the prefix. Here it's a post-hoc filter on results already fetched from the full workspace — so on a large monorepo, a scoped query could return 0 results because semble spent all topK slots on other directories. Is this a known tradeoff, or should we bump the initial topK and then trim?

response.headers.location
) {
response.destroy()
downloadFile(response.headers.location, destPath, maxRedirects - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that the redirect target stays on a trusted domain (github.com / objects.githubusercontent.com) before following it? A compromised CDN redirecting to a different HTTPS host would succeed silently, and the if (expectedChecksum) guard at line 169 means a future platform without a SHA256 entry would skip integrity verification entirely.

*/
function extractTarGz(archivePath: string, destDir: string): Promise<void> {
return new Promise((resolve, reject) => {
const child = spawn("tar", ["-xzf", archivePath, "-C", destDir], {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had we considered adding --no-absolute-filenames (or the equivalent) to the tar invocation? Without it, a crafted archive could write files outside destDir via absolute path entries. The checksum verification is the primary guard, but defense-in-depth on the extraction step seems worthwhile given the binary gets executed.

// Semble creates a separate cache directory per path (SHA-256 of the
// resolved absolute path), so passing subdirectories would create
// redundant indexes and waste disk space.
console.log(`[SembleProvider] Searching for "${query}" in ${this.workspacePath}`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs the raw user query on every search call. In enterprise deployments, extension host logs can be forwarded to telemetry — is there a debug-gated OutputChannel we should be using here instead of console.log, consistent with how the other providers log?

"win32-x64": { archive: "semble-windows-x64-fast.zip", binary: "semble.exe" },
}

const SEMBLE_VERSION = "v0.3.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: keeping SEMBLE_VERSION and SEMBLE_SHA256 in sync going forward, we could wire up Renovate to handle this automatically.

Renovate already has custom regex managers in this repo, so the version bump part is straightforward. The checksums are trickier since there are 4 platform-specific artifacts, but we can handle it with a postUpgradeTasks script that recomputes them after the version bump.

Rough shape:

  1. Add a helper script (scripts/update-semble-checksums.sh) that reads SEMBLE_VERSION from semble-downloader.ts, downloads each archive, computes shasum -a 256, and patches the file in-place.

  2. Add a custom manager to renovate.json:json

{
  "customType": "regex",
  "description": "Update semble binary version from GitHub releases.",
  "managerFilePatterns": ["src/services/code-index/semble/semble-downloader.ts"],
  "matchStrings": ["const SEMBLE_VERSION = \"(?<currentValue>v[^\"]+)\""],
  "depNameTemplate": "Zoo-Code-Org/sembleexec",
  "datasourceTemplate": "github-releases",
  "postUpgradeTasks": {
    "commands": ["scripts/update-semble-checksums.sh"],
    "fileFilters": ["src/services/code-index/semble/semble-downloader.ts"]
  }
}

This way when a new semble release drops, Renovate opens a PR with both the version bump and updated checksums — no manual step, no risk of the two maps drifting out of sync.

Doesn't have to be in this PR — just flagging it as a follow-up so we don't end up doing manual bumps forever.

Comment on lines +167 to +171
const platformKey = `${process.platform}-${process.arch}`
const expectedChecksum = SEMBLE_SHA256[platformKey]
if (expectedChecksum) {
await verifyChecksum(archivePath, expectedChecksum)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @coderabbitai on this - lets just hard-error, rather then conditionally verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add in built code indexing based on semble

4 participants