Skip to content

feat: add query/passage prefix env vars to GenericOpenAi embedder#5589

Open
jhsmith409 wants to merge 1 commit into
Mintplex-Labs:masterfrom
jhsmith409:feat/embedder-query-passage-prefix
Open

feat: add query/passage prefix env vars to GenericOpenAi embedder#5589
jhsmith409 wants to merge 1 commit into
Mintplex-Labs:masterfrom
jhsmith409:feat/embedder-query-passage-prefix

Conversation

@jhsmith409
Copy link
Copy Markdown

@jhsmith409 jhsmith409 commented May 7, 2026

Pull Request Type

  • ✨ feat (New feature)

Relevant Issues

connects #5403

Description

The generic-openai embedder currently sends every input through unwrapped — both queries (via embedTextInput) and passages (via embedChunks) hit the same input: chunk POST body. That's correct for symmetric models like text-embedding-ada-002, but it leaves a measurable accuracy gap for asymmetric embedders (Qwen3-Embedding family, BGE, E5-instruct, multilingual-e5-*, etc.) that were trained to expect a prefix on queries.

Per the Qwen3-Embedding model card, queries should be wrapped as:

Instruct: Given a web search query, retrieve relevant passages that answer the query
Query:{query}

…with passages sent raw. The model card cites ~1–5% MTEB regression when the wrap is omitted. There's currently no way to apply this in AnythingLLM. Verification of the unwrapped behavior is in #5403 (comment) — env grep, source snippet, and a wire-capture from the spied OpenAI client.

This PR adds two optional env vars on EMBEDDING_ENGINE=generic-openai:

Env var Applied to Default Typical Qwen3 value
EMBEDDING_QUERY_PREFIX embedTextInput (queries) "" "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:"
EMBEDDING_PASSAGE_PREFIX embedChunks (ingest) "" "" (Qwen3 leaves passages raw)

Both empty by default, so the wire payload is bitwise-identical to the current behavior unless a user opts in. The architecture made this easy: every vector-DB provider in server/utils/vectorDbProviders/* already calls embedTextInput only for queries and embedChunks only for ingest, so the change is local to one file.

A new embedChunks(textChunks, { isPassage = true }) second arg lets embedTextInput request the query path internally without re-prefixing. Existing call sites (lance, chroma, qdrant, pinecone, milvus, weaviate, astra, pgvector) all pass a single arg and continue getting the passage default — no upstream churn.

Visuals (if applicable)

n/a — env-only change.

Additional Information

Question for @timothycarambat on scoping — I deliberately kept this PR minimal so it's easy to review, but I floated three scope options before writing it. Happy to expand if you'd prefer one of the larger options:

  1. (this PR) genericOpenAi only, env-driven. Smallest, most reviewable; closes the prefix half of [FEAT]: Embedder Config: add chunking strategy and embedder query/passage prefix #5403; chunking-strategy half stays open for a follow-up.
  2. All OpenAI-compatible embedders — apply the same env vars across genericOpenAi, openai, lmstudio, localai, ollama, voyageai, litellm, lemonade. Bigger PR but consistent behavior wherever someone might plug in a Qwen3-style asymmetric model. Could be a follow-up that builds on the helper this PR establishes.
  3. Full UI + env (matches [FEAT]: Embedder Config: add chunking strategy and embedder query/passage prefix #5403 exactly) — UI fields in the Embedder Config screen plus env vars plus per-embedder plumbing, with persistence on the EmbeddingPreference model. Largest PR; requires frontend work and likely a small Prisma migration; matches what @chrisjunlee originally asked for.

If you'd rather I roll (2) or (3) into this PR before merge, let me know and I'll push more commits to this branch. If (1) lands as-is, I'm happy to follow up with (2) or (3) in a separate PR — your call which direction.

The chunking-strategy half of #5403 is intentionally out of scope here — it's a separable change with its own design questions (which strategies to expose, where to apply them in the splitter, etc.) and I don't want to couple them.

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated (if applicable)
  • I have tested my code functionality
  • Docker build succeeds locally

Notes on validation:

  • New Jest suite at server/__tests__/utils/EmbeddingEngines/genericOpenAi/index.test.js (5 tests, all passing) covers: defaults preserve old behavior; query prefix applied only to query path; passage prefix applied only to ingest path; query prefix doesn't leak to passage path when both are set; array inputs to embedTextInput get each element prefixed.
  • yarn lint:check from server/ passes.
  • Verified end-to-end against a live deployment serving Qwen3-Embedding-0.6B through a LiteLLM gateway: with the env vars set, the wire-capture inside the container shows {"input":["Instruct: ...\nQuery:<user query>"]} on the query path and raw chunks on the ingest path, matching the model card.
  • server/.env.example updated with both new env vars and a Qwen3-Embedding example, including a hint on dotenv \n quoting.
  • Docker build verified locally: docker build --platform=linux/amd64 -f docker/Dockerfile . from a fresh clone of this branch built successfully through the production-build stage (exit 0).

Adds two optional env vars to `EMBEDDING_ENGINE=generic-openai`:

- EMBEDDING_QUERY_PREFIX  — prepended only to inputs of `embedTextInput`
                            (the query path used by every vector DB provider's
                            similarity search).
- EMBEDDING_PASSAGE_PREFIX — prepended only to inputs of `embedChunks`
                            (the ingest path).

Both default to empty strings, so the wire payload is bitwise-identical to
prior behavior unless a user opts in.

Required by asymmetric embedding models (Qwen3-Embedding family, BGE, E5-instruct,
multilingual-e5-*, etc.). Per the Qwen3-Embedding model card, queries should be
wrapped as:

    Instruct: <task>\nQuery:<query>

…and passages sent raw. Skipping the wrap costs ~1-5% on the asymmetric MTEB
tasks the model was trained for. There was previously no way to apply this in
AnythingLLM.

Connects Mintplex-Labs#5403 (the prefix half; chunking-strategy half left for a follow-up).
@jhsmith409 jhsmith409 force-pushed the feat/embedder-query-passage-prefix branch from 5d9801a to 771ffd3 Compare May 7, 2026 11:53
@jhsmith409
Copy link
Copy Markdown
Author

End-to-end verification on a live deployment

Pulled this branch onto a host running mintplexlabs/anythingllm:latest against a Qwen3-Embedding-0.6B endpoint and confirmed the patched code path works through the real chat UI.

Setup

  1. Copied the patched genericOpenAi/index.js from this PR onto the host as a read-only bind mount over the container's /app/server/utils/EmbeddingEngines/genericOpenAi/index.js (so the patch survives container recreations without rebuilding the image).
  2. Set the env var in docker-compose.yml, YAML double-quoted so \n is preserved as a real newline:
    - "EMBEDDING_QUERY_PREFIX=Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:"
    Left EMBEDDING_PASSAGE_PREFIX unset (Qwen3 wants passages raw).
  3. docker compose down && docker compose up -d to recreate the container with the new mount + env.

Verification method

Added a temporary this.log(...) call inside embedChunks that prints isPassage and the first 160 chars of the first input. Bounced the container, sent a chat message in a workspace through the AnythingLLM UI, then watched docker logs anythingllm.

Captured log output from a real UI query

[GenericOpenAiEmbedder] [DEBUG] isPassage=false count=1
  sample="Instruct: Given a web search query, retrieve relevant passages
  that answer the query\nQuery:<user's query string>"

Confirms three things in one shot:

  • The query path went through embedTextInput (isPassage=false reaches embedChunks only via the new internal call from embedTextInput, never from external call sites).
  • EMBEDDING_QUERY_PREFIX was applied to the query.
  • The literal \n in the env value survives YAML/dotenv loading and reaches process.env as a real newline, matching the Qwen3-Embedding model card's expected wrap.

The temporary debug log was removed after the capture; the on-host patch is now byte-identical to the source in this PR (diff -q clean).

Vector store impact

None. Pre-existing passage vectors were already embedded raw (the correct Qwen3 form), and the patch doesn't change the passage path when EMBEDDING_PASSAGE_PREFIX is unset. Only newly issued queries are now wrapped — which is exactly the asymmetric retrieval behavior Qwen3 was trained for. No re-embedding needed.

@jhsmith409
Copy link
Copy Markdown
Author

Verified all four checklist items in a fresh clone of the PR branch (independent of my main working tree):

  • yarn lint from repo root — clean, no diff produced; existing commit is already lint-conformant.
  • Relevant documentation updatedserver/.env.example documents both EMBEDDING_QUERY_PREFIX and EMBEDDING_PASSAGE_PREFIX with the Qwen3-Embedding example and a hint on dotenv \n quoting.
  • Code functionality testednpx jest __tests__/utils/EmbeddingEngines/genericOpenAi/index.test.js → 5/5 passing (defaults preserve old behavior, query prefix only on embedTextInput, passage prefix only on embedChunks, query prefix doesn't leak into the passage path, array inputs prefixed correctly).
  • Docker build succeeds locallydocker build --platform=linux/amd64 -f docker/Dockerfile . against this branch built successfully (multi-stage build through production-build, image size ~4.87 GB, exit 0).

All four boxes on the PR template are now checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant