feat: add query/passage prefix env vars to GenericOpenAi embedder#5589
feat: add query/passage prefix env vars to GenericOpenAi embedder#5589jhsmith409 wants to merge 1 commit into
Conversation
Adds two optional env vars to `EMBEDDING_ENGINE=generic-openai`:
- EMBEDDING_QUERY_PREFIX — prepended only to inputs of `embedTextInput`
(the query path used by every vector DB provider's
similarity search).
- EMBEDDING_PASSAGE_PREFIX — prepended only to inputs of `embedChunks`
(the ingest path).
Both default to empty strings, so the wire payload is bitwise-identical to
prior behavior unless a user opts in.
Required by asymmetric embedding models (Qwen3-Embedding family, BGE, E5-instruct,
multilingual-e5-*, etc.). Per the Qwen3-Embedding model card, queries should be
wrapped as:
Instruct: <task>\nQuery:<query>
…and passages sent raw. Skipping the wrap costs ~1-5% on the asymmetric MTEB
tasks the model was trained for. There was previously no way to apply this in
AnythingLLM.
Connects Mintplex-Labs#5403 (the prefix half; chunking-strategy half left for a follow-up).
5d9801a to
771ffd3
Compare
|
End-to-end verification on a live deployment Pulled this branch onto a host running Setup
Verification method Added a temporary Captured log output from a real UI query Confirms three things in one shot:
The temporary debug log was removed after the capture; the on-host patch is now byte-identical to the source in this PR ( Vector store impact None. Pre-existing passage vectors were already embedded raw (the correct Qwen3 form), and the patch doesn't change the passage path when |
|
Verified all four checklist items in a fresh clone of the PR branch (independent of my main working tree):
All four boxes on the PR template are now checked. |
Pull Request Type
Relevant Issues
connects #5403
Description
The
generic-openaiembedder currently sends every input through unwrapped — both queries (viaembedTextInput) and passages (viaembedChunks) hit the sameinput: chunkPOST body. That's correct for symmetric models liketext-embedding-ada-002, but it leaves a measurable accuracy gap for asymmetric embedders (Qwen3-Embedding family, BGE, E5-instruct, multilingual-e5-*, etc.) that were trained to expect a prefix on queries.Per the Qwen3-Embedding model card, queries should be wrapped as:
…with passages sent raw. The model card cites ~1–5% MTEB regression when the wrap is omitted. There's currently no way to apply this in AnythingLLM. Verification of the unwrapped behavior is in #5403 (comment) — env grep, source snippet, and a wire-capture from the spied OpenAI client.
This PR adds two optional env vars on
EMBEDDING_ENGINE=generic-openai:EMBEDDING_QUERY_PREFIXembedTextInput(queries)"""Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:"EMBEDDING_PASSAGE_PREFIXembedChunks(ingest)""""(Qwen3 leaves passages raw)Both empty by default, so the wire payload is bitwise-identical to the current behavior unless a user opts in. The architecture made this easy: every vector-DB provider in
server/utils/vectorDbProviders/*already callsembedTextInputonly for queries andembedChunksonly for ingest, so the change is local to one file.A new
embedChunks(textChunks, { isPassage = true })second arg letsembedTextInputrequest the query path internally without re-prefixing. Existing call sites (lance, chroma, qdrant, pinecone, milvus, weaviate, astra, pgvector) all pass a single arg and continue getting the passage default — no upstream churn.Visuals (if applicable)
n/a — env-only change.
Additional Information
Question for @timothycarambat on scoping — I deliberately kept this PR minimal so it's easy to review, but I floated three scope options before writing it. Happy to expand if you'd prefer one of the larger options:
genericOpenAionly, env-driven. Smallest, most reviewable; closes the prefix half of [FEAT]: Embedder Config: add chunking strategy and embedder query/passage prefix #5403; chunking-strategy half stays open for a follow-up.genericOpenAi,openai,lmstudio,localai,ollama,voyageai,litellm,lemonade. Bigger PR but consistent behavior wherever someone might plug in a Qwen3-style asymmetric model. Could be a follow-up that builds on the helper this PR establishes.EmbeddingPreferencemodel. Largest PR; requires frontend work and likely a small Prisma migration; matches what @chrisjunlee originally asked for.If you'd rather I roll (2) or (3) into this PR before merge, let me know and I'll push more commits to this branch. If (1) lands as-is, I'm happy to follow up with (2) or (3) in a separate PR — your call which direction.
The chunking-strategy half of #5403 is intentionally out of scope here — it's a separable change with its own design questions (which strategies to expose, where to apply them in the splitter, etc.) and I don't want to couple them.
Developer Validations
yarn lintfrom the root of the repo & committed changesNotes on validation:
server/__tests__/utils/EmbeddingEngines/genericOpenAi/index.test.js(5 tests, all passing) covers: defaults preserve old behavior; query prefix applied only to query path; passage prefix applied only to ingest path; query prefix doesn't leak to passage path when both are set; array inputs toembedTextInputget each element prefixed.yarn lint:checkfromserver/passes.{"input":["Instruct: ...\nQuery:<user query>"]}on the query path and raw chunks on the ingest path, matching the model card.server/.env.exampleupdated with both new env vars and a Qwen3-Embedding example, including a hint on dotenv\nquoting.docker build --platform=linux/amd64 -f docker/Dockerfile .from a fresh clone of this branch built successfully through theproduction-buildstage (exit 0).