LCORE-1853: Add relevance cutoff score to inline BYOK RAG by syedriko · Pull Request #1720 · lightspeed-core/lightspeed-stack

syedriko · 2026-05-11T19:12:24Z

Description

In the Lightspeed Stack configuration, byok_rag section, added a "relevance_cutoff_score" configuraiton variable, a positive float.
When a RAG chunk is retrieved from a BYOK database and its relevance score is less than the cutoff, the chunk is dropped from further consideration. Specifically, this cutoff filtering happens before score multipliers are applied.
Only added handling of relevance_cutoff_score to the inline BYOK RAG for now as adding it to the tool BYOK RAG is calling for changes in Llama Stack.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes # https://redhat.atlassian.net/browse/LCORE-1853

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Per-knowledge-source relevance_cutoff_score for BYOK Inline RAG to filter results by minimum similarity (default 0.3); Tool RAG/file_search unaffected.
Documentation
- BYOK guide updated with usage, tuning guidance, Inline vs Tool RAG notes, and a reminder to refresh sources and monitor performance.
API
- Public schema exposes relevance_cutoff_score with default and validation.
Tests
- Unit tests expanded to cover configuration, validation, propagation, and filtering behavior.

coderabbitai · 2026-05-11T19:13:14Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a791573e-b3ab-45c7-ba42-5465d7a60429

📥 Commits

Reviewing files that changed from the base of the PR and between fee7578 and f6deca8.

📒 Files selected for processing (11)

docs/byok_guide.md
docs/openapi.json
pyproject.toml
src/configuration.py
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/test_configuration.py
tests/unit/utils/test_vector_search.py

💤 Files with no reviewable changes (3)

tests/unit/test_configuration.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: build-pr
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/constants.py
src/configuration.py
src/models/config.py
src/utils/vector_search.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/**/configuration.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/configuration.py: All config models must extend ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models

Files:

src/configuration.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/models/config/test_byok_rag.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt

docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. docs/openapi.json differs from the newly generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json

🪛 GitHub Actions: OpenAPI (Spectral) / spectral

docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. Pipeline diff failed between 'docs/openapi.json' and generated '/tmp/openapi-generated.json'. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'

🪛 markdownlint-cli2 (0.22.1)

docs/byok_guide.md

[warning] 92-92: Blank line inside blockquote

(MD028, no-blanks-blockquote)

[warning] 93-93: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🛑 Comments failed to post (4)

pyproject.toml (1)

229-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path
import ast

def counts(node):
    args = node.args
    positional = len(args.posonlyargs) + len(args.args)
    total = positional + len(args.kwonlyargs)
    return total, positional

for path in sorted(Path("src").rglob("*.py")):
    try:
        tree = ast.parse(path.read_text())
    except Exception as exc:
        print(f"PARSE_ERROR {path}: {exc}")
        continue

    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            total, positional = counts(node)
            if total > 5 or positional > 5:
                kind = "async def" if isinstance(node, ast.AsyncFunctionDef) else "def"
                print(f"{path}:{node.lineno}: {kind} {node.name} total_args={total} positional_args={positional}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 1585

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact pylint design config in pyproject.toml around the flagged lines
echo "=== pyproject.toml: tool.pylint.design context ==="
python - <<'PY'
from pathlib import Path
p = Path("pyproject.toml")
lines = p.read_text().splitlines()
start, end = 200, 260
for i in range(start, min(end, len(lines))):
    # line numbers are 1-based
    print(f"{i+1}:{lines[i]}")
PY

# 2) Find BYOK RAG / relevance-cutoff related identifiers in src/
echo
echo "=== Search: byok / rag / relevance cutoff identifiers ==="
rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \
  -S '(byok|byo?k|rag|relevance|cutoff|threshold)' src || true

# 3) Count args in the function(s) likely touched by relevance-cutoff work (best-effort via grep of nearby matches)
echo
echo "=== Best-effort: show function defs around matches ==="
rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \
  -S 'def |async def ' src/app src/utils src/models | head -n 200 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 50388

🏁 Script executed:

#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
import ast

def arg_counts(fn: ast.AST):
    # pylint max-args / max-positional-arguments behavior is close to counting these.
    args = fn.args
    posonly = len(getattr(args, "posonlyargs", []))
    positional = posonly + len(args.args)
    kwonly = len(args.kwonlyargs)
    # vararg / kwarg exist in ast.arguments
    vararg = 1 if args.vararg is not None else 0
    kwarg = 1 if args.kwarg is not None else 0
    total = positional + kwonly + vararg + kwarg
    return total, positional

def scan(path: Path):
    txt = path.read_text(encoding="utf-8")
    tree = ast.parse(txt)
    out = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            total, positional = arg_counts(node)
            out.append((str(path), node.lineno, node.name, isinstance(node, ast.AsyncFunctionDef), total, positional))
    return out

roots = [Path("src")]
rows = []
parse_errors = []
for root in roots:
    for path in sorted(root.rglob("*.py")):
        try:
            rows.extend(scan(path))
        except SyntaxError as e:
            parse_errors.append((str(path), e.lineno, e.msg))
        except Exception as e:
            parse_errors.append((str(path), None, repr(e)))

# Focus: what exceeds 5 (pylint default commonly) and what exceeds 8 (your new limit).
exceed_5_total = [r for r in rows if r[4] > 5]
exceed_8_total = [r for r in rows if r[4] > 8]
exceed_5_pos = [r for r in rows if r[5] > 5]
exceed_8_pos = [r for r in rows if r[5] > 8]

print(f"functions total_params>5: {len(exceed_5_total)}")
print(f"functions total_params>8: {len(exceed_8_total)}")
print(f"functions positional_params>5: {len(exceed_5_pos)}")
print(f"functions positional_params>8: {len(exceed_8_pos)}")

# BYOK relevance-cutoff likely lives in vector_search.py
vs_rows = [r for r in rows if r[0].endswith("src/utils/vector_search.py")]
print(f"\nvector_search.py function count: {len(vs_rows)}")
print("vector_search.py with total_params>5 (show):")
for r in sorted([r for r in vs_rows if r[4] > 5], key=lambda x:(x[0],x[1])):
    print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}")

print("\nTop 20 by total_params (to gauge breadth):")
for r in sorted(rows, key=lambda x:x[4], reverse=True)[:20]:
    print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}")

print("\nParse errors (first 20):")
for pe in parse_errors[:20]:
    print(pe)
print(f"Parse errors: {len(parse_errors)}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 2206

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show how pyproject.toml:229-231 changed in this PR/branch
git diff -U3 -- pyproject.toml | sed -n '1,200p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check whether we have parent commits to diff against
git rev-parse HEAD~1 >/dev/null 2>&1 || { echo "NO_PARENT_COMMIT"; exit 0; }

echo "=== Current pyproject.toml: tool.pylint.design ==="
rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' pyproject.toml

echo
echo "=== Previous commit (HEAD~1): tool.pylint.design ==="
git show HEAD~1:pyproject.toml | rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' || true

echo
echo "=== Previous commit: snippet around tool.pylint.design section (if present) ==="
git show HEAD~1:pyproject.toml | python - <<'PY'
import sys, re
txt=sys.stdin.read().splitlines()
for i,line in enumerate(txt):
    if line.strip()=="[tool.pylint.design]":
        for j in range(max(0,i-5), min(len(txt), i+12)):
            print(f"{j+1}:{txt[j]}")
        break
else:
    print("No [tool.pylint.design] section in HEAD~1")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 93

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== local pylint disables for argument-count ==="
rg -n 'pylint:\s*disable=.*too-many-(arguments|positional-arguments)' src || true

echo
echo "=== _query_store_for_byok_rag snippet ==="
python - <<'PY'
from pathlib import Path
p=Path("src/utils/vector_search.py")
lines=p.read_text().splitlines()
for i,l in enumerate(lines):
    if "async def _query_store_for_byok_rag" in l:
        for j in range(max(0,i-20), min(len(lines), i+90)):
            print(f"{j+1}:{lines[j]}")
        break
PY

echo
echo "=== functions with total args >5 and <=8 (approx mapping to pylint design limits) ==="
python - <<'PY'
from pathlib import Path
import ast

def arg_counts(fn: ast.AST):
    args=fn.args
    posonly=len(getattr(args,'posonlyargs',[]))
    positional=posonly+len(args.args)
    kwonly=len(args.kwonlyargs)
    vararg=1 if args.vararg else 0
    kwarg=1 if args.kwarg else 0
    total=positional+kwonly+vararg+kwarg
    return total,positional

rows=[]
for path in Path("src").rglob("*.py"):
    try:
        tree=ast.parse(path.read_text(encoding="utf-8"))
    except SyntaxError:
        continue
    for node in ast.walk(tree):
        if isinstance(node,(ast.FunctionDef,ast.AsyncFunctionDef)):
            total,positional=arg_counts(node)
            if total>5 and total<=8:
                kind="async def" if isinstance(node,ast.AsyncFunctionDef) else "def"
                rows.append((str(path),node.lineno,kind,node.name,total,positional))

rows=sorted(rows, key=lambda r:(r[0],r[1]))
print(f"affected_count={len(rows)}")
for r in rows:
    path,ln,kind,name,total,pos=r
    print(f"{path}:{ln} {kind} {name} total={total} positional={pos}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 6930

Avoid repo-wide pylint max-args relaxation for BYOK RAG
pyproject.toml sets [tool.pylint.design] max-args = 8 and max-positional-arguments = 8 (lines 229-231), which relaxes too-many-arguments/too-many-positional-arguments across the whole repo. The scan shows 9 functions in src/ with total args in the 6–8 range; only src/utils/vector_search.py:248 async _query_store_for_byok_rag(...) is BYOK RAG-related, while others include response_generator, _check_shield_moderation, prepare_tools, build_turn_summary, and create_transcript_metadata. Since other outliers are already handled via local # pylint: disable=too-many-arguments suppressions, keep the stricter global limits and apply a targeted suppression/refactor specifically for _query_store_for_byok_rag.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 229 - 231, Revert the repo-wide pylint
relaxation in pyproject.toml by removing or restoring the [tool.pylint.design]
max-args and max-positional-arguments overrides so the global
"too-many-arguments" rules remain strict, and instead add a targeted suppression
or refactor for the BYOK RAG helper: open src/utils/vector_search.py and either
refactor async function _query_store_for_byok_rag(...) to reduce its parameters
or add a local "# pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.

src/models/config.py (2)

2018-2023: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use the shared reranker-model constant here.

This default duplicates constants.DEFAULT_CROSS_ENCODER_MODEL, so the schema default and runtime default can drift on the next model change. Reference the shared constant instead of hardcoding the string.
♻️ Proposed fix
     model: str = Field(
-        default="cross-encoder/ms-marco-MiniLM-L6-v2",
+        default=constants.DEFAULT_CROSS_ENCODER_MODEL,
         title="Reranker model",
         description="Cross-encoder model name for reranking RAG chunks. "
-        "Defaults to 'cross-encoder/ms-marco-MiniLM-L6-v2' from sentence-transformers.",
+        f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.",
     )
As per coding guidelines, "Check constants.py for shared constants before defining new ones".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    model: str = Field(
        default=constants.DEFAULT_CROSS_ENCODER_MODEL,
        title="Reranker model",
        description="Cross-encoder model name for reranking RAG chunks. "
        f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.",
    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2018 - 2023, The Field default for the
"model" attribute in the config schema is hardcoded to
"cross-encoder/ms-marco-MiniLM-L6-v2" which duplicates the shared constant;
update the default to reference the shared constant DEFAULT_CROSS_ENCODER_MODEL
from the constants module (import it at the top if necessary) so the schema
default and runtime default stay in sync, i.e., replace the literal string used
in the Field default with constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the
import for DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.
2351-2387: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

validate_reranker_auto_enable only honors inline OKP.

The validator’s docstring and comments say OKP in inline or tool strategies should trigger auto-enable, but Line 2368 only checks self.rag.inline. A config with BYOK plus okp only in rag.tool will skip this path. Either include self.rag.tool in the predicate or narrow the documented contract to inline-only behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2351 - 2387, The validator currently only
checks self.rag.inline for OKP (constants.OKP_RAG_ID) but the docstring/comments
say inline OR tool should trigger auto-enable; update the predicate by changing
the has_okp computation in validate_reranker_auto_enable to consider both
self.rag.inline and self.rag.tool (e.g., check constants.OKP_RAG_ID in either
collection) and adjust the log message/comment to say "inline or tool" as
needed; keep the existing protected-access and pylint disables around
reranker._explicitly_configured unchanged.

src/utils/vector_search.py (1)

107-119: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Duplicate/malformed Returns: section in docstring.

This docstring has two Returns: blocks: the first (Lines 113-116) actually documents params keys rather than the return value, and the second (Lines 118-119) is the real return description. Collapse into a single, correct Returns: section to follow Google docstring conventions.
📝 Proposed docstring fix
     Args:
         solr: Optional structured Solr request (mode and filters from the API).
         k: Optional number of results to return. If not provided, uses default.

     Returns:
-            - mode: Solr search mode (semantic, hybrid, lexical)
-            - filters: Solr filter payload, may contain structured metadata filters
-
-    Returns:
-        Parameter dictionary for ``vector_io.query`` with extracted filters at top level.
+        Parameter dictionary for ``vector_io.query`` with extracted filters at top
+        level (keys include ``mode`` and, when present, ``filters``).
     """
As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    """Build query parameters for Solr vector_io search.

    Args:
        solr: Optional structured Solr request (mode and filters from the API).
        k: Optional number of results to return. If not provided, uses default.

    Returns:
        Parameter dictionary for ``vector_io.query`` with extracted filters at top
        level (keys include ``mode`` and, when present, ``filters``).
    """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 107 - 119, The docstring in the Solr
vector_io query builder in src/utils/vector_search.py contains two Returns
sections; collapse them into a single Google-style Returns section that
describes the returned dict and its keys (mode, filters, and any other params)
and remove the duplicated/malformed Returns block so the docstring follows
Google conventions for the function that builds query parameters for
vector_io.query.

🔇 Additional comments (8)

src/utils/vector_search.py (2)

270-279: LGTM!

501-523: ⚡ Quick win

Ensure relevance_cutoff_mapping uses the same keyspace as the vector store lookup ID.
src/utils/vector_search.py looks up relevance_cutoff_mapping.get(vector_store_id, DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) where vector_store_id comes from vector_store_ids_to_query; confirm relevance_cutoff_mapping is keyed by the same vector_db_id namespace. If it’s keyed by user-facing rag_id instead, the configured per-store cutoff won’t be applied and will always fall back to the default.

tests/unit/models/config/test_byok_rag.py (1)

39-39: LGTM!

Also applies to: 59-68, 194-226

src/constants.py (1)

3-3: LGTM!

Also applies to: 161-163, 184-205, 295-300

src/models/config.py (1)

231-240: LGTM!

Also applies to: 470-527, 617-625, 704-727, 1817-1823, 2176-2180, 2237-2247

src/configuration.py (1)

17-17: LGTM!

Also applies to: 31-31, 463-489, 534-550

docs/byok_guide.md (2)

92-93: MD028 blockquote spacing issue still present.

This is the same markdownlint issue already reported earlier: blank lines inside adjacent blockquote notes.

82-82: LGTM!

Also applies to: 89-91, 94-98, 302-303, 309-311, 338-341, 591-591

Walkthrough

This PR adds per-knowledge-source relevance cutoff score configuration for BYOK RAG, enabling Inline RAG queries to filter vector search results by minimum similarity threshold. The constant, config model, AppConfig mapping, vector search integration, OpenAPI schema, docs, and comprehensive tests are introduced.

Changes

BYOK RAG Relevance Cutoff Configuration

Layer / File(s)	Summary
Default Constant `src/constants.py`	`DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE` constant set to `0.3` defines the module-level default minimum similarity threshold.
Configuration Model `src/models/config.py`	`ByokRag` model adds `relevance_cutoff_score: float` field with positive value constraint (`gt=0`) and default from the constant.
AppConfig mapping `src/configuration.py`	`AppConfig.relevance_cutoff_mapping` property returns a dict mapping `vector_db_id` → `relevance_cutoff_score` and raises `LogicError` if config is unset.
Vector Search Integration `src/utils/vector_search.py`	`_query_store_for_byok_rag()` signature extended with `score_threshold`, docstring updated, `params["score_threshold"]` forwarded to `client.vector_io.query`, and `_fetch_byok_rag()` obtains per-store cutoff and passes it into queries.
API Schema `docs/openapi.json`	`relevance_cutoff_score` property added with `type: number`, `exclusiveMinimum: 0.0`, and `default: 0.3`.
User Guide `docs/byok_guide.md`	Documentation of `relevance_cutoff_score`, mapping to vector store `score_threshold`, Inline RAG-only note, updated YAML example, and Inline vs Tool RAG comparison table.
Pylint Config `pyproject.toml`	Add `tool.pylint.design` with `max-args = 8` and `max-positional-arguments = 8`.
Unit Tests `tests/unit/models/config/*`, `tests/unit/test_configuration.py`, `tests/unit/utils/test_vector_search.py`	Tests cover default and custom cutoff values, validation failures, configuration serialization, AppConfig mapping behavior, forwarding of `score_threshold` into `vector_io.query`, and end-to-end filtering of sub-threshold hits.

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs:

lightspeed-core/lightspeed-stack#1566: Related changes in BYOK retrieval/reranking and integration with build_rag_context.

Suggested reviewers:

are-ces
tisnik

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding a relevance cutoff score feature to inline BYOK RAG, which is the core objective of this pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/openapi.json`:
- Around line 11809-11815: The OpenAPI schema file docs/openapi.json is out of
sync and should be regenerated rather than edited; run the provided generator
script (uv run scripts/generate_openapi_schema.py docs/openapi.json) to recreate
the schema, then verify the relevance_cutoff_score field in the regenerated JSON
has the correct default value that matches
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE from src/constants.py and commit the
regenerated file.

In `@src/utils/vector_search.py`:
- Around line 29-34: Update the docstring for _relevance_cutoff_for_vector_store
to follow Google Python conventions: add a brief description line, then a
Parameters section documenting vector_store_id (str) and what it represents, and
a Returns section describing the returned float (either the matched
brag.relevance_cutoff_score or
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE). If the function can raise
any exceptions, add a Raises section; otherwise omit Raises. Ensure the
docstring text references the function name and types only (no code) and matches
the existing one-line summary.

In `@tests/unit/models/config/test_dump_configuration.py`:
- Line 11: Replace the top-level import "import constants" with explicit
from-imports for the names actually used in this test (e.g., "from constants
import FOO, BAR" — substitute the real constant names referenced in this file)
and update usages that refer to constants.<NAME> (including the reference around
line 996) to use the direct names (FOO, BAR, etc.) so the import style matches
the existing "from X import Y" pattern used elsewhere in the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0c9945a-ead4-4245-9d4b-99ea45582ed7

📥 Commits

Reviewing files that changed from the base of the PR and between 70cb0db and 6af93fb.

📒 Files selected for processing (8)

docs/byok_guide.md
docs/openapi.json
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: black
GitHub Check: unit_tests (3.13)
GitHub Check: Pylinter
GitHub Check: radon
GitHub Check: build-pr
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/models/config.py
src/constants.py
src/utils/vector_search.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py
tests/unit/models/config/test_byok_rag.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt

docs/openapi.json

[error] 1-1: OpenAPI schema out of date. diff detected changes between docs/openapi.json and /tmp/openapi-generated.json. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.

🪛 GitHub Actions: OpenAPI (Spectral) / spectral

docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.

🪛 markdownlint-cli2 (0.22.1)

docs/byok_guide.md

[warning] 89-89: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (15)

src/constants.py (1)

196-198: LGTM! Well-defined constant for BYOK RAG relevance filtering.

The constant follows the established pattern with proper type hints and descriptive comment. The default value of 0.3 is reasonable for similarity-based filtering.

src/models/config.py (1)

1636-1643: LGTM! Proper Pydantic field definition for relevance cutoff.

The field is correctly configured with:

Appropriate type annotation and default value from constants

gt=0 validation ensuring positive cutoff values

Clear description specifying "raw similarity score" (before multipliers)

The validation constraint requiring values > 0 prevents disabling the cutoff entirely, which aligns with the PR requirement for a "positive float."

tests/unit/models/config/test_dump_configuration.py (1)

996-996: LGTM! Test correctly validates default value serialization.

The test properly verifies that the new relevance_cutoff_score field is included in the dumped configuration with the correct default value from constants.

tests/unit/models/config/test_byok_rag.py (3)

7-7: LGTM!

The import of DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE is appropriate for testing the default value behavior.

39-39: LGTM!

The test assertions properly validate both the default and custom relevance_cutoff_score values. The choice of 0.72 as a test value provides good coverage.

Also applies to: 59-59, 68-68

208-220: LGTM!

The validation test correctly verifies the gt=0 constraint on relevance_cutoff_score. Testing the boundary case with 0.0 provides good coverage.

src/utils/vector_search.py (2)

186-213: LGTM!

The function correctly accepts and forwards the score_threshold parameter to client.vector_io.query. The docstring is properly updated to document the new parameter.

418-427: LGTM!

The implementation correctly retrieves the per-vector-store relevance_cutoff_score and passes it to the query function. Each store in the parallel queries will use its own configured cutoff threshold.

docs/byok_guide.md (4)

82-82: LGTM!

The documentation clearly explains the relevance_cutoff_score feature, including its purpose, default value, and how it's applied during Inline RAG retrieval.

297-298: LGTM!

The YAML example clearly demonstrates how to configure relevance_cutoff_score, including helpful comments about the default value and scope.

304-306: LGTM!

The clarification about score space and per-store tuning is valuable guidance for users configuring this feature.

333-336: LGTM!

The comparison table clearly shows that relevance_cutoff_score applies only to Inline RAG (BYOK), providing a helpful reference for users choosing between modes.

tests/unit/utils/test_vector_search.py (3)

430-432: LGTM!

The mock configurations are consistently updated to include relevance_cutoff_score, using the default constant value appropriately.

Also applies to: 472-474, 516-518, 522-524, 752-754

500-504: LGTM!

The assertion correctly verifies that score_threshold is passed to vector_io.query along with the other expected parameters.

561-599: LGTM!

The new test explicitly verifies that a custom relevance_cutoff_score configuration is correctly passed to vector_io.query as score_threshold. This provides comprehensive coverage for the feature.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 84-94: Remove the empty line between the two adjacent blockquote
note blocks so they are back-to-back (i.e., merge the `> [!NOTE]` block about
OKP/BYOK `score_multiplier` and chunk constants with the following `> [!NOTE]`
block about `relevance_cutoff_score`) to satisfy MD028; locate the two `>
[!NOTE]` paragraphs in the byok guide and delete the blank line separating them
so markdownlint no longer reports the blockquote-blank rule.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dc0b1607-f348-4610-b49a-bace5ce7b17c

📥 Commits

Reviewing files that changed from the base of the PR and between 6af93fb and 8e966b0.

📒 Files selected for processing (8)

docs/byok_guide.md
docs/openapi.json
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: spectral
GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/models/config/test_dump_configuration.py
tests/unit/models/config/test_byok_rag.py
tests/unit/utils/test_vector_search.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/constants.py
src/models/config.py
src/utils/vector_search.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 markdownlint-cli2 (0.22.1)

docs/byok_guide.md

[warning] 89-89: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (4)

src/constants.py (1)

196-198: Good addition of shared BYOK relevance cutoff constant.
Typed with Final[float], clearly named, and placed in the correct constants section.

tests/unit/models/config/test_dump_configuration.py (2)

11-11: Import update looks correct and consistent with this test module.

1028-1028: Expected dump payload correctly includes relevance_cutoff_score.
This strengthens coverage for the new BYOK RAG config field serialization path.

docs/openapi.json (1)

11810-11815: Default value is correct.

The default value of 0.3 matches the DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE constant from src/constants.py. The schema definition with exclusiveMinimum: 0.0 correctly enforces positive values and the description accurately reflects the filtering behavior.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/byok_guide.md (1)
89-89: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the blank blockquote line to satisfy markdownlint MD028.

The empty blockquote line between the two adjacent > [!NOTE] blocks still triggers the MD028 (no-blanks-blockquote) rule violation that was previously flagged.
📝 Proposed fix
 > context, set the `BYOK_RAG_MAX_CHUNKS` and `OKP_RAG_MAX_CHUNKS` constants in `src/constants.py`
 > (defaults: 10 and 5 respectively). For Tool RAG, use `TOOL_RAG_MAX_CHUNKS` (default: 10).
->
 > [!NOTE]
 > `relevance_cutoff_score` applies to Inline RAG only. When the model uses Tool RAG (`file_search`),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/byok_guide.md` at line 89, Remove the empty blockquote line between the
two adjacent blockquote markers so MD028 is satisfied: find the consecutive
lines starting with "> [!NOTE]" and delete the blank line (the standalone
">"/empty blockquote) separating them so the two note blocks are directly
adjacent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/openapi.json`:
- Around line 11809-11815: Regenerate the out-of-sync OpenAPI schema file
docs/openapi.json by running the project script
scripts/generate_openapi_schema.py (via the project's runner e.g., `uv run`)
instead of editing the JSON manually; after regeneration verify the
relevance_cutoff_score default in docs/openapi.json matches the
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE constant defined in src/constants.py,
then commit the regenerated docs/openapi.json.

---

Duplicate comments:
In `@docs/byok_guide.md`:
- Line 89: Remove the empty blockquote line between the two adjacent blockquote
markers so MD028 is satisfied: find the consecutive lines starting with ">
[!NOTE]" and delete the blank line (the standalone ">"/empty blockquote)
separating them so the two note blocks are directly adjacent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d4d39bf3-18f3-4c52-9c75-cdb4a6f534b2

📥 Commits

Reviewing files that changed from the base of the PR and between 8e966b0 and fee7578.

📒 Files selected for processing (8)

docs/byok_guide.md
docs/openapi.json
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: build-pr
GitHub Check: integration_tests (3.12)
GitHub Check: Pylinter
GitHub Check: unit_tests (3.12)
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/constants.py
src/models/config.py
src/utils/vector_search.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt

docs/openapi.json

[error] 1-1: docs/openapi.json is out of date. Step checks openapi schema via uv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json and diff -u docs/openapi.json /tmp/openapi-generated.json; diff failed. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json.

🪛 GitHub Actions: OpenAPI (Spectral) / spectral

docs/openapi.json

[error] 1-1: docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated /tmp/openapi-generated.json. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json

🔇 Additional comments (11)

src/constants.py (1)

196-197: LGTM!

src/models/config.py (1)

1721-1727: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

11-11: LGTM!

Also applies to: 1028-1028

tests/unit/models/config/test_byok_rag.py (1)

7-7: LGTM!

Also applies to: 39-39, 59-59, 68-68, 208-220

src/utils/vector_search.py (2)

29-44: LGTM!

196-223: LGTM!

Also applies to: 434-434

docs/byok_guide.md (1)

82-82: LGTM!

Also applies to: 297-298, 303-306, 333-336, 586-586

tests/unit/utils/test_vector_search.py (4)

3-5: LGTM!

Also applies to: 26-26, 31-54

460-462: LGTM!

Also applies to: 502-504, 530-535, 546-548, 552-554, 882-884

591-629: LGTM!

631-674: LGTM!

Also applies to: 676-729

tisnik

LGTM

syedriko · 2026-05-13T01:53:35Z

/ok-to-test

are-ces · 2026-05-13T10:22:03Z

Related to #1525

are-ces

LGTM overall TYVM!

I have a couple of nits which you can ignore but I would like to see some integration test for this functionality

are-ces · 2026-05-13T10:34:45Z

                    vector_store_id,
                    query,
                    score_multiplier_mapping.get(vector_store_id, 1.0),
+                    _relevance_cutoff_for_vector_store(vector_store_id),


Tiny nit from SWE perspective. Just wondering: since score multiplier and relevance score have the same data structure, would it make sense to retrieve cutoff score the same way?

are-ces · 2026-05-13T10:36:31Z

        )
+
+
+def test_byok_rag_configuration_relevance_cutoff_must_be_positive() -> None:


Nit, do we also want to test behavior if it is negative?

syedriko · 2026-05-22T16:52:34Z

@are-ces PTAL

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/utils/vector_search.py (2)
568-573: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Stale max_chunks entry in _fetch_solr_rag docstring.

The Args section documents a max_chunks parameter (Lines 572-573) that does not exist in the signature (client, query, solr). Remove it to keep the docstring consistent with the function.
📝 Proposed docstring fix
         solr: Structured Solr inline RAG request from the API (optional).
-        max_chunks: Maximum number of chunks to return. If None, uses
-            constants.OKP_RAG_MAX_CHUNKS.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 568 - 573, The docstring for
_fetch_solr_rag incorrectly documents a non-existent parameter max_chunks;
update the Args section of the _fetch_solr_rag docstring to remove the
max_chunks entry so it matches the function signature (client, query, solr) and
keep the existing descriptions for client, query, and solr unchanged.
248-282: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix BYOK RAG docstring parameter order in _query_store_for_byok_rag

The docstring lists max_chunks before score_threshold, but the signature is (..., score_threshold: float, max_chunks: int = ...). Swap the Args: entries to match.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 248 - 282, The docstring for
_query_store_for_byok_rag lists the parameters in the wrong order (shows
max_chunks before score_threshold) which doesn't match the function signature;
update the Args: section of the _query_store_for_byok_rag docstring so
score_threshold appears before max_chunks and adjust their descriptions to match
the signature and default (constants.BYOK_RAG_MAX_CHUNKS).

♻️ Duplicate comments (1)

docs/openapi.json (1)
1-19424: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Regenerate docs/openapi.json from source-of-truth instead of editing this artifact manually.

This file is out of sync with the generated schema, and CI is currently failing on the spectral gate. Please regenerate and commit the generated output, otherwise this PR cannot merge.
🔧 Required fix
-# Manually edited OpenAPI artifact (drifted)
+# Regenerated OpenAPI artifact from code
uv run scripts/generate_openapi_schema.py docs/openapi.json
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/openapi.json` around lines 1 - 19424, The openapi.json file is out of
sync with the generated schema and manual edits to it have caused CI failures.
To fix this, regenerate the openapi.json file from the source-of-truth schema
using the provided generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Around line 229-231: Revert the repo-wide pylint relaxation in pyproject.toml
by removing or restoring the [tool.pylint.design] max-args and
max-positional-arguments overrides so the global "too-many-arguments" rules
remain strict, and instead add a targeted suppression or refactor for the BYOK
RAG helper: open src/utils/vector_search.py and either refactor async function
_query_store_for_byok_rag(...) to reduce its parameters or add a local "#
pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.

In `@src/models/config.py`:
- Around line 2351-2387: The validator currently only checks self.rag.inline for
OKP (constants.OKP_RAG_ID) but the docstring/comments say inline OR tool should
trigger auto-enable; update the predicate by changing the has_okp computation in
validate_reranker_auto_enable to consider both self.rag.inline and self.rag.tool
(e.g., check constants.OKP_RAG_ID in either collection) and adjust the log
message/comment to say "inline or tool" as needed; keep the existing
protected-access and pylint disables around reranker._explicitly_configured
unchanged.
- Around line 2018-2023: The Field default for the "model" attribute in the
config schema is hardcoded to "cross-encoder/ms-marco-MiniLM-L6-v2" which
duplicates the shared constant; update the default to reference the shared
constant DEFAULT_CROSS_ENCODER_MODEL from the constants module (import it at the
top if necessary) so the schema default and runtime default stay in sync, i.e.,
replace the literal string used in the Field default with
constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the import for
DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.

In `@src/utils/vector_search.py`:
- Around line 107-119: The docstring in the Solr vector_io query builder in
src/utils/vector_search.py contains two Returns sections; collapse them into a
single Google-style Returns section that describes the returned dict and its
keys (mode, filters, and any other params) and remove the duplicated/malformed
Returns block so the docstring follows Google conventions for the function that
builds query parameters for vector_io.query.

---

Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 568-573: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks; update the Args section of the
_fetch_solr_rag docstring to remove the max_chunks entry so it matches the
function signature (client, query, solr) and keep the existing descriptions for
client, query, and solr unchanged.
- Around line 248-282: The docstring for _query_store_for_byok_rag lists the
parameters in the wrong order (shows max_chunks before score_threshold) which
doesn't match the function signature; update the Args: section of the
_query_store_for_byok_rag docstring so score_threshold appears before max_chunks
and adjust their descriptions to match the signature and default
(constants.BYOK_RAG_MAX_CHUNKS).

---

Duplicate comments:
In `@docs/openapi.json`:
- Around line 1-19424: The openapi.json file is out of sync with the generated
schema and manual edits to it have caused CI failures. To fix this, regenerate
the openapi.json file from the source-of-truth schema using the provided
generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a791573e-b3ab-45c7-ba42-5465d7a60429

📥 Commits

Reviewing files that changed from the base of the PR and between fee7578 and f6deca8.

📒 Files selected for processing (11)

docs/byok_guide.md
docs/openapi.json
pyproject.toml
src/configuration.py
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/test_configuration.py
tests/unit/utils/test_vector_search.py

💤 Files with no reviewable changes (3)

tests/unit/test_configuration.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: build-pr
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/constants.py
src/configuration.py
src/models/config.py
src/utils/vector_search.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/**/configuration.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/configuration.py: All config models must extend ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models

Files:

src/configuration.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/models/config/test_byok_rag.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt

docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. docs/openapi.json differs from the newly generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json

🪛 GitHub Actions: OpenAPI (Spectral) / spectral

docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. Pipeline diff failed between 'docs/openapi.json' and generated '/tmp/openapi-generated.json'. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'

🪛 markdownlint-cli2 (0.22.1)

docs/byok_guide.md

[warning] 92-92: Blank line inside blockquote

(MD028, no-blanks-blockquote)

[warning] 93-93: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (8)

src/utils/vector_search.py (2)

270-279: LGTM!

501-523: ⚡ Quick win

Ensure relevance_cutoff_mapping uses the same keyspace as the vector store lookup ID.
src/utils/vector_search.py looks up relevance_cutoff_mapping.get(vector_store_id, DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) where vector_store_id comes from vector_store_ids_to_query; confirm relevance_cutoff_mapping is keyed by the same vector_db_id namespace. If it’s keyed by user-facing rag_id instead, the configured per-store cutoff won’t be applied and will always fall back to the default.

tests/unit/models/config/test_byok_rag.py (1)

39-39: LGTM!

Also applies to: 59-68, 194-226

src/constants.py (1)

3-3: LGTM!

Also applies to: 161-163, 184-205, 295-300

src/models/config.py (1)

231-240: LGTM!

Also applies to: 470-527, 617-625, 704-727, 1817-1823, 2176-2180, 2237-2247

src/configuration.py (1)

17-17: LGTM!

Also applies to: 31-31, 463-489, 534-550

docs/byok_guide.md (2)

92-93: MD028 blockquote spacing issue still present.

This is the same markdownlint issue already reported earlier: blank lines inside adjacent blockquote notes.

82-82: LGTM!

Also applies to: 89-91, 94-98, 302-303, 309-311, 338-341, 591-591

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/utils/vector_search.py (2)
568-573: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Stale max_chunks entry in _fetch_solr_rag docstring.

The Args section documents a max_chunks parameter (Lines 572-573) that does not exist in the signature (client, query, solr). Remove it to keep the docstring consistent with the function.
📝 Proposed docstring fix
         solr: Structured Solr inline RAG request from the API (optional).
-        max_chunks: Maximum number of chunks to return. If None, uses
-            constants.OKP_RAG_MAX_CHUNKS.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 568 - 573, The docstring for
_fetch_solr_rag incorrectly documents a non-existent parameter max_chunks;
update the Args section of the _fetch_solr_rag docstring to remove the
max_chunks entry so it matches the function signature (client, query, solr) and
keep the existing descriptions for client, query, and solr unchanged.
248-282: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix BYOK RAG docstring parameter order in _query_store_for_byok_rag

The docstring lists max_chunks before score_threshold, but the signature is (..., score_threshold: float, max_chunks: int = ...). Swap the Args: entries to match.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 248 - 282, The docstring for
_query_store_for_byok_rag lists the parameters in the wrong order (shows
max_chunks before score_threshold) which doesn't match the function signature;
update the Args: section of the _query_store_for_byok_rag docstring so
score_threshold appears before max_chunks and adjust their descriptions to match
the signature and default (constants.BYOK_RAG_MAX_CHUNKS).

♻️ Duplicate comments (1)

docs/openapi.json (1)
1-19424: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Regenerate docs/openapi.json from source-of-truth instead of editing this artifact manually.

This file is out of sync with the generated schema, and CI is currently failing on the spectral gate. Please regenerate and commit the generated output, otherwise this PR cannot merge.
🔧 Required fix
-# Manually edited OpenAPI artifact (drifted)
+# Regenerated OpenAPI artifact from code
uv run scripts/generate_openapi_schema.py docs/openapi.json
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/openapi.json` around lines 1 - 19424, The openapi.json file is out of
sync with the generated schema and manual edits to it have caused CI failures.
To fix this, regenerate the openapi.json file from the source-of-truth schema
using the provided generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Around line 229-231: Revert the repo-wide pylint relaxation in pyproject.toml
by removing or restoring the [tool.pylint.design] max-args and
max-positional-arguments overrides so the global "too-many-arguments" rules
remain strict, and instead add a targeted suppression or refactor for the BYOK
RAG helper: open src/utils/vector_search.py and either refactor async function
_query_store_for_byok_rag(...) to reduce its parameters or add a local "#
pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.

In `@src/models/config.py`:
- Around line 2351-2387: The validator currently only checks self.rag.inline for
OKP (constants.OKP_RAG_ID) but the docstring/comments say inline OR tool should
trigger auto-enable; update the predicate by changing the has_okp computation in
validate_reranker_auto_enable to consider both self.rag.inline and self.rag.tool
(e.g., check constants.OKP_RAG_ID in either collection) and adjust the log
message/comment to say "inline or tool" as needed; keep the existing
protected-access and pylint disables around reranker._explicitly_configured
unchanged.
- Around line 2018-2023: The Field default for the "model" attribute in the
config schema is hardcoded to "cross-encoder/ms-marco-MiniLM-L6-v2" which
duplicates the shared constant; update the default to reference the shared
constant DEFAULT_CROSS_ENCODER_MODEL from the constants module (import it at the
top if necessary) so the schema default and runtime default stay in sync, i.e.,
replace the literal string used in the Field default with
constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the import for
DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.

In `@src/utils/vector_search.py`:
- Around line 107-119: The docstring in the Solr vector_io query builder in
src/utils/vector_search.py contains two Returns sections; collapse them into a
single Google-style Returns section that describes the returned dict and its
keys (mode, filters, and any other params) and remove the duplicated/malformed
Returns block so the docstring follows Google conventions for the function that
builds query parameters for vector_io.query.

---

Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 568-573: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks; update the Args section of the
_fetch_solr_rag docstring to remove the max_chunks entry so it matches the
function signature (client, query, solr) and keep the existing descriptions for
client, query, and solr unchanged.
- Around line 248-282: The docstring for _query_store_for_byok_rag lists the
parameters in the wrong order (shows max_chunks before score_threshold) which
doesn't match the function signature; update the Args: section of the
_query_store_for_byok_rag docstring so score_threshold appears before max_chunks
and adjust their descriptions to match the signature and default
(constants.BYOK_RAG_MAX_CHUNKS).

---

Duplicate comments:
In `@docs/openapi.json`:
- Around line 1-19424: The openapi.json file is out of sync with the generated
schema and manual edits to it have caused CI failures. To fix this, regenerate
the openapi.json file from the source-of-truth schema using the provided
generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a791573e-b3ab-45c7-ba42-5465d7a60429

📥 Commits

Reviewing files that changed from the base of the PR and between fee7578 and f6deca8.

📒 Files selected for processing (11)

docs/byok_guide.md
docs/openapi.json
pyproject.toml
src/configuration.py
src/constants.py
src/models/config.py
src/utils/vector_search.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/test_configuration.py
tests/unit/utils/test_vector_search.py

💤 Files with no reviewable changes (3)

tests/unit/test_configuration.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/utils/test_vector_search.py

📜 Review details

🔇 Additional comments (8)

src/utils/vector_search.py (2)

270-279: LGTM!

501-523: ⚡ Quick win

Ensure relevance_cutoff_mapping uses the same keyspace as the vector store lookup ID.
src/utils/vector_search.py looks up relevance_cutoff_mapping.get(vector_store_id, DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) where vector_store_id comes from vector_store_ids_to_query; confirm relevance_cutoff_mapping is keyed by the same vector_db_id namespace. If it’s keyed by user-facing rag_id instead, the configured per-store cutoff won’t be applied and will always fall back to the default.

tests/unit/models/config/test_byok_rag.py (1)

39-39: LGTM!

Also applies to: 59-68, 194-226

src/constants.py (1)

3-3: LGTM!

Also applies to: 161-163, 184-205, 295-300

src/models/config.py (1)

231-240: LGTM!

Also applies to: 470-527, 617-625, 704-727, 1817-1823, 2176-2180, 2237-2247

src/configuration.py (1)

17-17: LGTM!

Also applies to: 31-31, 463-489, 534-550

docs/byok_guide.md (2)

92-93: MD028 blockquote spacing issue still present.

This is the same markdownlint issue already reported earlier: blank lines inside adjacent blockquote notes.

82-82: LGTM!

Also applies to: 89-91, 94-98, 302-303, 309-311, 338-341, 591-591

🛑 Comments failed to post (4)

pyproject.toml (1)

229-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path
import ast

def counts(node):
    args = node.args
    positional = len(args.posonlyargs) + len(args.args)
    total = positional + len(args.kwonlyargs)
    return total, positional

for path in sorted(Path("src").rglob("*.py")):
    try:
        tree = ast.parse(path.read_text())
    except Exception as exc:
        print(f"PARSE_ERROR {path}: {exc}")
        continue

    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            total, positional = counts(node)
            if total > 5 or positional > 5:
                kind = "async def" if isinstance(node, ast.AsyncFunctionDef) else "def"
                print(f"{path}:{node.lineno}: {kind} {node.name} total_args={total} positional_args={positional}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 1585

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact pylint design config in pyproject.toml around the flagged lines
echo "=== pyproject.toml: tool.pylint.design context ==="
python - <<'PY'
from pathlib import Path
p = Path("pyproject.toml")
lines = p.read_text().splitlines()
start, end = 200, 260
for i in range(start, min(end, len(lines))):
    # line numbers are 1-based
    print(f"{i+1}:{lines[i]}")
PY

# 2) Find BYOK RAG / relevance-cutoff related identifiers in src/
echo
echo "=== Search: byok / rag / relevance cutoff identifiers ==="
rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \
  -S '(byok|byo?k|rag|relevance|cutoff|threshold)' src || true

# 3) Count args in the function(s) likely touched by relevance-cutoff work (best-effort via grep of nearby matches)
echo
echo "=== Best-effort: show function defs around matches ==="
rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \
  -S 'def |async def ' src/app src/utils src/models | head -n 200 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 50388

🏁 Script executed:

#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
import ast

def arg_counts(fn: ast.AST):
    # pylint max-args / max-positional-arguments behavior is close to counting these.
    args = fn.args
    posonly = len(getattr(args, "posonlyargs", []))
    positional = posonly + len(args.args)
    kwonly = len(args.kwonlyargs)
    # vararg / kwarg exist in ast.arguments
    vararg = 1 if args.vararg is not None else 0
    kwarg = 1 if args.kwarg is not None else 0
    total = positional + kwonly + vararg + kwarg
    return total, positional

def scan(path: Path):
    txt = path.read_text(encoding="utf-8")
    tree = ast.parse(txt)
    out = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            total, positional = arg_counts(node)
            out.append((str(path), node.lineno, node.name, isinstance(node, ast.AsyncFunctionDef), total, positional))
    return out

roots = [Path("src")]
rows = []
parse_errors = []
for root in roots:
    for path in sorted(root.rglob("*.py")):
        try:
            rows.extend(scan(path))
        except SyntaxError as e:
            parse_errors.append((str(path), e.lineno, e.msg))
        except Exception as e:
            parse_errors.append((str(path), None, repr(e)))

# Focus: what exceeds 5 (pylint default commonly) and what exceeds 8 (your new limit).
exceed_5_total = [r for r in rows if r[4] > 5]
exceed_8_total = [r for r in rows if r[4] > 8]
exceed_5_pos = [r for r in rows if r[5] > 5]
exceed_8_pos = [r for r in rows if r[5] > 8]

print(f"functions total_params>5: {len(exceed_5_total)}")
print(f"functions total_params>8: {len(exceed_8_total)}")
print(f"functions positional_params>5: {len(exceed_5_pos)}")
print(f"functions positional_params>8: {len(exceed_8_pos)}")

# BYOK relevance-cutoff likely lives in vector_search.py
vs_rows = [r for r in rows if r[0].endswith("src/utils/vector_search.py")]
print(f"\nvector_search.py function count: {len(vs_rows)}")
print("vector_search.py with total_params>5 (show):")
for r in sorted([r for r in vs_rows if r[4] > 5], key=lambda x:(x[0],x[1])):
    print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}")

print("\nTop 20 by total_params (to gauge breadth):")
for r in sorted(rows, key=lambda x:x[4], reverse=True)[:20]:
    print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}")

print("\nParse errors (first 20):")
for pe in parse_errors[:20]:
    print(pe)
print(f"Parse errors: {len(parse_errors)}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 2206

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show how pyproject.toml:229-231 changed in this PR/branch
git diff -U3 -- pyproject.toml | sed -n '1,200p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check whether we have parent commits to diff against
git rev-parse HEAD~1 >/dev/null 2>&1 || { echo "NO_PARENT_COMMIT"; exit 0; }

echo "=== Current pyproject.toml: tool.pylint.design ==="
rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' pyproject.toml

echo
echo "=== Previous commit (HEAD~1): tool.pylint.design ==="
git show HEAD~1:pyproject.toml | rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' || true

echo
echo "=== Previous commit: snippet around tool.pylint.design section (if present) ==="
git show HEAD~1:pyproject.toml | python - <<'PY'
import sys, re
txt=sys.stdin.read().splitlines()
for i,line in enumerate(txt):
    if line.strip()=="[tool.pylint.design]":
        for j in range(max(0,i-5), min(len(txt), i+12)):
            print(f"{j+1}:{txt[j]}")
        break
else:
    print("No [tool.pylint.design] section in HEAD~1")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 93

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== local pylint disables for argument-count ==="
rg -n 'pylint:\s*disable=.*too-many-(arguments|positional-arguments)' src || true

echo
echo "=== _query_store_for_byok_rag snippet ==="
python - <<'PY'
from pathlib import Path
p=Path("src/utils/vector_search.py")
lines=p.read_text().splitlines()
for i,l in enumerate(lines):
    if "async def _query_store_for_byok_rag" in l:
        for j in range(max(0,i-20), min(len(lines), i+90)):
            print(f"{j+1}:{lines[j]}")
        break
PY

echo
echo "=== functions with total args >5 and <=8 (approx mapping to pylint design limits) ==="
python - <<'PY'
from pathlib import Path
import ast

def arg_counts(fn: ast.AST):
    args=fn.args
    posonly=len(getattr(args,'posonlyargs',[]))
    positional=posonly+len(args.args)
    kwonly=len(args.kwonlyargs)
    vararg=1 if args.vararg else 0
    kwarg=1 if args.kwarg else 0
    total=positional+kwonly+vararg+kwarg
    return total,positional

rows=[]
for path in Path("src").rglob("*.py"):
    try:
        tree=ast.parse(path.read_text(encoding="utf-8"))
    except SyntaxError:
        continue
    for node in ast.walk(tree):
        if isinstance(node,(ast.FunctionDef,ast.AsyncFunctionDef)):
            total,positional=arg_counts(node)
            if total>5 and total<=8:
                kind="async def" if isinstance(node,ast.AsyncFunctionDef) else "def"
                rows.append((str(path),node.lineno,kind,node.name,total,positional))

rows=sorted(rows, key=lambda r:(r[0],r[1]))
print(f"affected_count={len(rows)}")
for r in rows:
    path,ln,kind,name,total,pos=r
    print(f"{path}:{ln} {kind} {name} total={total} positional={pos}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 6930

Avoid repo-wide pylint max-args relaxation for BYOK RAG
pyproject.toml sets [tool.pylint.design] max-args = 8 and max-positional-arguments = 8 (lines 229-231), which relaxes too-many-arguments/too-many-positional-arguments across the whole repo. The scan shows 9 functions in src/ with total args in the 6–8 range; only src/utils/vector_search.py:248 async _query_store_for_byok_rag(...) is BYOK RAG-related, while others include response_generator, _check_shield_moderation, prepare_tools, build_turn_summary, and create_transcript_metadata. Since other outliers are already handled via local # pylint: disable=too-many-arguments suppressions, keep the stricter global limits and apply a targeted suppression/refactor specifically for _query_store_for_byok_rag.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 229 - 231, Revert the repo-wide pylint
relaxation in pyproject.toml by removing or restoring the [tool.pylint.design]
max-args and max-positional-arguments overrides so the global
"too-many-arguments" rules remain strict, and instead add a targeted suppression
or refactor for the BYOK RAG helper: open src/utils/vector_search.py and either
refactor async function _query_store_for_byok_rag(...) to reduce its parameters
or add a local "# pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.

src/models/config.py (2)

2018-2023: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use the shared reranker-model constant here.

This default duplicates constants.DEFAULT_CROSS_ENCODER_MODEL, so the schema default and runtime default can drift on the next model change. Reference the shared constant instead of hardcoding the string.
♻️ Proposed fix
     model: str = Field(
-        default="cross-encoder/ms-marco-MiniLM-L6-v2",
+        default=constants.DEFAULT_CROSS_ENCODER_MODEL,
         title="Reranker model",
         description="Cross-encoder model name for reranking RAG chunks. "
-        "Defaults to 'cross-encoder/ms-marco-MiniLM-L6-v2' from sentence-transformers.",
+        f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.",
     )
As per coding guidelines, "Check constants.py for shared constants before defining new ones".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    model: str = Field(
        default=constants.DEFAULT_CROSS_ENCODER_MODEL,
        title="Reranker model",
        description="Cross-encoder model name for reranking RAG chunks. "
        f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.",
    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2018 - 2023, The Field default for the
"model" attribute in the config schema is hardcoded to
"cross-encoder/ms-marco-MiniLM-L6-v2" which duplicates the shared constant;
update the default to reference the shared constant DEFAULT_CROSS_ENCODER_MODEL
from the constants module (import it at the top if necessary) so the schema
default and runtime default stay in sync, i.e., replace the literal string used
in the Field default with constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the
import for DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.
2351-2387: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

validate_reranker_auto_enable only honors inline OKP.

The validator’s docstring and comments say OKP in inline or tool strategies should trigger auto-enable, but Line 2368 only checks self.rag.inline. A config with BYOK plus okp only in rag.tool will skip this path. Either include self.rag.tool in the predicate or narrow the documented contract to inline-only behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2351 - 2387, The validator currently only
checks self.rag.inline for OKP (constants.OKP_RAG_ID) but the docstring/comments
say inline OR tool should trigger auto-enable; update the predicate by changing
the has_okp computation in validate_reranker_auto_enable to consider both
self.rag.inline and self.rag.tool (e.g., check constants.OKP_RAG_ID in either
collection) and adjust the log message/comment to say "inline or tool" as
needed; keep the existing protected-access and pylint disables around
reranker._explicitly_configured unchanged.

src/utils/vector_search.py (1)

107-119: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Duplicate/malformed Returns: section in docstring.

This docstring has two Returns: blocks: the first (Lines 113-116) actually documents params keys rather than the return value, and the second (Lines 118-119) is the real return description. Collapse into a single, correct Returns: section to follow Google docstring conventions.
📝 Proposed docstring fix
     Args:
         solr: Optional structured Solr request (mode and filters from the API).
         k: Optional number of results to return. If not provided, uses default.

     Returns:
-            - mode: Solr search mode (semantic, hybrid, lexical)
-            - filters: Solr filter payload, may contain structured metadata filters
-
-    Returns:
-        Parameter dictionary for ``vector_io.query`` with extracted filters at top level.
+        Parameter dictionary for ``vector_io.query`` with extracted filters at top
+        level (keys include ``mode`` and, when present, ``filters``).
     """
As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    """Build query parameters for Solr vector_io search.

    Args:
        solr: Optional structured Solr request (mode and filters from the API).
        k: Optional number of results to return. If not provided, uses default.

    Returns:
        Parameter dictionary for ``vector_io.query`` with extracted filters at top
        level (keys include ``mode`` and, when present, ``filters``).
    """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 107 - 119, The docstring in the Solr
vector_io query builder in src/utils/vector_search.py contains two Returns
sections; collapse them into a single Google-style Returns section that
describes the returned dict and its keys (mode, filters, and any other params)
and remove the duplicated/malformed Returns block so the docstring follows
Google conventions for the function that builds query parameters for
vector_io.query.

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/utils/vector_search.py Outdated

Comment thread tests/unit/models/config/test_dump_configuration.py

syedriko force-pushed the syedriko-lcore-1853-2 branch 3 times, most recently from 7187d0b to 8e966b0 Compare May 11, 2026 21:20

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread docs/byok_guide.md

syedriko force-pushed the syedriko-lcore-1853-2 branch from 8e966b0 to fee7578 Compare May 11, 2026 23:52

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread docs/openapi.json

tisnik approved these changes May 12, 2026

View reviewed changes

tisnik requested a review from are-ces May 12, 2026 12:08

openshift-ci Bot added the ok-to-test label May 13, 2026

are-ces requested changes May 13, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1853-2 branch from fee7578 to 51e6d31 Compare May 22, 2026 16:50

syedriko force-pushed the syedriko-lcore-1853-2 branch 2 times, most recently from 83d6d09 to 87c28b5 Compare May 27, 2026 02:44

LCORE-1853: Add relevance cutoff score to inline BYOK RAG

f6deca8

syedriko force-pushed the syedriko-lcore-1853-2 branch from 87c28b5 to f6deca8 Compare June 2, 2026 14:18

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

		)


		def test_byok_rag_configuration_relevance_cutoff_must_be_positive() -> None:

Conversation

syedriko commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

syedriko commented May 13, 2026

Uh oh!

are-ces commented May 13, 2026

Uh oh!

are-ces left a comment

Choose a reason for hiding this comment

Uh oh!

are-ces May 13, 2026

Choose a reason for hiding this comment

Uh oh!

are-ces May 13, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko commented May 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

syedriko commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading