LCORE-1853: Add relevance cutoff score to inline BYOK RAG#1720
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
💤 Files with no reviewable changes (3)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
🧰 Additional context used📓 Path-based instructions (5)src/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/constants.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/**/configuration.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/models/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (2)📚 Learning: 2026-01-12T10:58:40.230ZApplied to files:
📚 Learning: 2026-02-25T07:46:33.545ZApplied to files:
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txtdocs/openapi.json[error] 1-1: OpenAPI schema is out of date. docs/openapi.json differs from the newly generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json 🪛 GitHub Actions: OpenAPI (Spectral) / spectraldocs/openapi.json[error] 1-1: OpenAPI schema is out of date. Pipeline diff failed between 'docs/openapi.json' and generated '/tmp/openapi-generated.json'. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json' 🪛 markdownlint-cli2 (0.22.1)docs/byok_guide.md[warning] 92-92: Blank line inside blockquote (MD028, no-blanks-blockquote) [warning] 93-93: Blank line inside blockquote (MD028, no-blanks-blockquote) 🛑 Comments failed to post (4)
🔇 Additional comments (8)
WalkthroughThis PR adds per-knowledge-source relevance cutoff score configuration for BYOK RAG, enabling Inline RAG queries to filter vector search results by minimum similarity threshold. The constant, config model, AppConfig mapping, vector search integration, OpenAPI schema, docs, and comprehensive tests are introduced. ChangesBYOK RAG Relevance Cutoff Configuration
🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs:
Suggested reviewers:
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/openapi.json`:
- Around line 11809-11815: The OpenAPI schema file docs/openapi.json is out of
sync and should be regenerated rather than edited; run the provided generator
script (uv run scripts/generate_openapi_schema.py docs/openapi.json) to recreate
the schema, then verify the relevance_cutoff_score field in the regenerated JSON
has the correct default value that matches
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE from src/constants.py and commit the
regenerated file.
In `@src/utils/vector_search.py`:
- Around line 29-34: Update the docstring for _relevance_cutoff_for_vector_store
to follow Google Python conventions: add a brief description line, then a
Parameters section documenting vector_store_id (str) and what it represents, and
a Returns section describing the returned float (either the matched
brag.relevance_cutoff_score or
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE). If the function can raise
any exceptions, add a Raises section; otherwise omit Raises. Ensure the
docstring text references the function name and types only (no code) and matches
the existing one-line summary.
In `@tests/unit/models/config/test_dump_configuration.py`:
- Line 11: Replace the top-level import "import constants" with explicit
from-imports for the names actually used in this test (e.g., "from constants
import FOO, BAR" — substitute the real constant names referenced in this file)
and update usages that refer to constants.<NAME> (including the reference around
line 996) to use the direct names (FOO, BAR, etc.) so the import style matches
the existing "from X import Y" pattern used elsewhere in the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f0c9945a-ead4-4245-9d4b-99ea45582ed7
📒 Files selected for processing (8)
docs/byok_guide.mddocs/openapi.jsonsrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: black
- GitHub Check: unit_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: radon
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/models/config.pysrc/constants.pysrc/utils/vector_search.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.pytests/unit/models/config/test_byok_rag.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: OpenAPI schema out of date. diff detected changes between docs/openapi.json and /tmp/openapi-generated.json. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.
🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json
[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.
🪛 markdownlint-cli2 (0.22.1)
docs/byok_guide.md
[warning] 89-89: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🔇 Additional comments (15)
src/constants.py (1)
196-198: LGTM! Well-defined constant for BYOK RAG relevance filtering.The constant follows the established pattern with proper type hints and descriptive comment. The default value of 0.3 is reasonable for similarity-based filtering.
src/models/config.py (1)
1636-1643: LGTM! Proper Pydantic field definition for relevance cutoff.The field is correctly configured with:
- Appropriate type annotation and default value from constants
gt=0validation ensuring positive cutoff values- Clear description specifying "raw similarity score" (before multipliers)
The validation constraint requiring values > 0 prevents disabling the cutoff entirely, which aligns with the PR requirement for a "positive float."
tests/unit/models/config/test_dump_configuration.py (1)
996-996: LGTM! Test correctly validates default value serialization.The test properly verifies that the new
relevance_cutoff_scorefield is included in the dumped configuration with the correct default value from constants.tests/unit/models/config/test_byok_rag.py (3)
7-7: LGTM!The import of
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCOREis appropriate for testing the default value behavior.
39-39: LGTM!The test assertions properly validate both the default and custom
relevance_cutoff_scorevalues. The choice of 0.72 as a test value provides good coverage.Also applies to: 59-59, 68-68
208-220: LGTM!The validation test correctly verifies the
gt=0constraint onrelevance_cutoff_score. Testing the boundary case with0.0provides good coverage.src/utils/vector_search.py (2)
186-213: LGTM!The function correctly accepts and forwards the
score_thresholdparameter toclient.vector_io.query. The docstring is properly updated to document the new parameter.
418-427: LGTM!The implementation correctly retrieves the per-vector-store
relevance_cutoff_scoreand passes it to the query function. Each store in the parallel queries will use its own configured cutoff threshold.docs/byok_guide.md (4)
82-82: LGTM!The documentation clearly explains the
relevance_cutoff_scorefeature, including its purpose, default value, and how it's applied during Inline RAG retrieval.
297-298: LGTM!The YAML example clearly demonstrates how to configure
relevance_cutoff_score, including helpful comments about the default value and scope.
304-306: LGTM!The clarification about score space and per-store tuning is valuable guidance for users configuring this feature.
333-336: LGTM!The comparison table clearly shows that
relevance_cutoff_scoreapplies only to Inline RAG (BYOK), providing a helpful reference for users choosing between modes.tests/unit/utils/test_vector_search.py (3)
430-432: LGTM!The mock configurations are consistently updated to include
relevance_cutoff_score, using the default constant value appropriately.Also applies to: 472-474, 516-518, 522-524, 752-754
500-504: LGTM!The assertion correctly verifies that
score_thresholdis passed tovector_io.queryalong with the other expected parameters.
561-599: LGTM!The new test explicitly verifies that a custom
relevance_cutoff_scoreconfiguration is correctly passed tovector_io.queryasscore_threshold. This provides comprehensive coverage for the feature.
7187d0b to
8e966b0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/byok_guide.md`:
- Around line 84-94: Remove the empty line between the two adjacent blockquote
note blocks so they are back-to-back (i.e., merge the `> [!NOTE]` block about
OKP/BYOK `score_multiplier` and chunk constants with the following `> [!NOTE]`
block about `relevance_cutoff_score`) to satisfy MD028; locate the two `>
[!NOTE]` paragraphs in the byok guide and delete the blank line separating them
so markdownlint no longer reports the blockquote-blank rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dc0b1607-f348-4610-b49a-bace5ce7b17c
📒 Files selected for processing (8)
docs/byok_guide.mddocs/openapi.jsonsrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: spectral
- GitHub Check: Pylinter
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/models/config.pysrc/utils/vector_search.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 markdownlint-cli2 (0.22.1)
docs/byok_guide.md
[warning] 89-89: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🔇 Additional comments (4)
src/constants.py (1)
196-198: Good addition of shared BYOK relevance cutoff constant.
Typed withFinal[float], clearly named, and placed in the correct constants section.tests/unit/models/config/test_dump_configuration.py (2)
11-11: Import update looks correct and consistent with this test module.
1028-1028: Expected dump payload correctly includesrelevance_cutoff_score.
This strengthens coverage for the new BYOK RAG config field serialization path.docs/openapi.json (1)
11810-11815: Default value is correct.The default value of
0.3matches theDEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCOREconstant fromsrc/constants.py. The schema definition withexclusiveMinimum: 0.0correctly enforces positive values and the description accurately reflects the filtering behavior.
8e966b0 to
fee7578
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/byok_guide.md (1)
89-89:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the blank blockquote line to satisfy markdownlint MD028.
The empty blockquote line between the two adjacent
> [!NOTE]blocks still triggers theMD028 (no-blanks-blockquote)rule violation that was previously flagged.📝 Proposed fix
> context, set the `BYOK_RAG_MAX_CHUNKS` and `OKP_RAG_MAX_CHUNKS` constants in `src/constants.py` > (defaults: 10 and 5 respectively). For Tool RAG, use `TOOL_RAG_MAX_CHUNKS` (default: 10). -> > [!NOTE] > `relevance_cutoff_score` applies to Inline RAG only. When the model uses Tool RAG (`file_search`),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/byok_guide.md` at line 89, Remove the empty blockquote line between the two adjacent blockquote markers so MD028 is satisfied: find the consecutive lines starting with "> [!NOTE]" and delete the blank line (the standalone ">"/empty blockquote) separating them so the two note blocks are directly adjacent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/openapi.json`:
- Around line 11809-11815: Regenerate the out-of-sync OpenAPI schema file
docs/openapi.json by running the project script
scripts/generate_openapi_schema.py (via the project's runner e.g., `uv run`)
instead of editing the JSON manually; after regeneration verify the
relevance_cutoff_score default in docs/openapi.json matches the
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE constant defined in src/constants.py,
then commit the regenerated docs/openapi.json.
---
Duplicate comments:
In `@docs/byok_guide.md`:
- Line 89: Remove the empty blockquote line between the two adjacent blockquote
markers so MD028 is satisfied: find the consecutive lines starting with ">
[!NOTE]" and delete the blank line (the standalone ">"/empty blockquote)
separating them so the two note blocks are directly adjacent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d4d39bf3-18f3-4c52-9c75-cdb4a6f534b2
📒 Files selected for processing (8)
docs/byok_guide.mddocs/openapi.jsonsrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: integration_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/models/config.pysrc/utils/vector_search.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: docs/openapi.json is out of date. Step checks openapi schema via uv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json and diff -u docs/openapi.json /tmp/openapi-generated.json; diff failed. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json.
🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json
[error] 1-1: docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated /tmp/openapi-generated.json. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json
🔇 Additional comments (11)
src/constants.py (1)
196-197: LGTM!src/models/config.py (1)
1721-1727: LGTM!tests/unit/models/config/test_dump_configuration.py (1)
11-11: LGTM!Also applies to: 1028-1028
tests/unit/models/config/test_byok_rag.py (1)
7-7: LGTM!Also applies to: 39-39, 59-59, 68-68, 208-220
src/utils/vector_search.py (2)
29-44: LGTM!
196-223: LGTM!Also applies to: 434-434
docs/byok_guide.md (1)
82-82: LGTM!Also applies to: 297-298, 303-306, 333-336, 586-586
tests/unit/utils/test_vector_search.py (4)
3-5: LGTM!Also applies to: 26-26, 31-54
460-462: LGTM!Also applies to: 502-504, 530-535, 546-548, 552-554, 882-884
591-629: LGTM!
631-674: LGTM!Also applies to: 676-729
|
/ok-to-test |
|
Related to #1525 |
are-ces
left a comment
There was a problem hiding this comment.
LGTM overall TYVM!
I have a couple of nits which you can ignore but I would like to see some integration test for this functionality
| vector_store_id, | ||
| query, | ||
| score_multiplier_mapping.get(vector_store_id, 1.0), | ||
| _relevance_cutoff_for_vector_store(vector_store_id), |
There was a problem hiding this comment.
Tiny nit from SWE perspective. Just wondering: since score multiplier and relevance score have the same data structure, would it make sense to retrieve cutoff score the same way?
| ) | ||
|
|
||
|
|
||
| def test_byok_rag_configuration_relevance_cutoff_must_be_positive() -> None: |
There was a problem hiding this comment.
Nit, do we also want to test behavior if it is negative?
fee7578 to
51e6d31
Compare
|
@are-ces PTAL |
83d6d09 to
87c28b5
Compare
87c28b5 to
f6deca8
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/vector_search.py (2)
568-573: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueStale
max_chunksentry in_fetch_solr_ragdocstring.The Args section documents a
max_chunksparameter (Lines 572-573) that does not exist in the signature(client, query, solr). Remove it to keep the docstring consistent with the function.📝 Proposed docstring fix
solr: Structured Solr inline RAG request from the API (optional). - max_chunks: Maximum number of chunks to return. If None, uses - constants.OKP_RAG_MAX_CHUNKS.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 568 - 573, The docstring for _fetch_solr_rag incorrectly documents a non-existent parameter max_chunks; update the Args section of the _fetch_solr_rag docstring to remove the max_chunks entry so it matches the function signature (client, query, solr) and keep the existing descriptions for client, query, and solr unchanged.
248-282:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix BYOK RAG docstring parameter order in
_query_store_for_byok_rag
- The docstring lists
max_chunksbeforescore_threshold, but the signature is(..., score_threshold: float, max_chunks: int = ...). Swap theArgs:entries to match.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 248 - 282, The docstring for _query_store_for_byok_rag lists the parameters in the wrong order (shows max_chunks before score_threshold) which doesn't match the function signature; update the Args: section of the _query_store_for_byok_rag docstring so score_threshold appears before max_chunks and adjust their descriptions to match the signature and default (constants.BYOK_RAG_MAX_CHUNKS).
♻️ Duplicate comments (1)
docs/openapi.json (1)
1-19424:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRegenerate
docs/openapi.jsonfrom source-of-truth instead of editing this artifact manually.This file is out of sync with the generated schema, and CI is currently failing on the spectral gate. Please regenerate and commit the generated output, otherwise this PR cannot merge.
🔧 Required fix
-# Manually edited OpenAPI artifact (drifted) +# Regenerated OpenAPI artifact from codeuv run scripts/generate_openapi_schema.py docs/openapi.json🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/openapi.json` around lines 1 - 19424, The openapi.json file is out of sync with the generated schema and manual edits to it have caused CI failures. To fix this, regenerate the openapi.json file from the source-of-truth schema using the provided generation script. Specifically, run the command `uv run scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date and consistent openapi.json artifact, then commit the regenerated file to the PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pyproject.toml`:
- Around line 229-231: Revert the repo-wide pylint relaxation in pyproject.toml
by removing or restoring the [tool.pylint.design] max-args and
max-positional-arguments overrides so the global "too-many-arguments" rules
remain strict, and instead add a targeted suppression or refactor for the BYOK
RAG helper: open src/utils/vector_search.py and either refactor async function
_query_store_for_byok_rag(...) to reduce its parameters or add a local "#
pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.
In `@src/models/config.py`:
- Around line 2351-2387: The validator currently only checks self.rag.inline for
OKP (constants.OKP_RAG_ID) but the docstring/comments say inline OR tool should
trigger auto-enable; update the predicate by changing the has_okp computation in
validate_reranker_auto_enable to consider both self.rag.inline and self.rag.tool
(e.g., check constants.OKP_RAG_ID in either collection) and adjust the log
message/comment to say "inline or tool" as needed; keep the existing
protected-access and pylint disables around reranker._explicitly_configured
unchanged.
- Around line 2018-2023: The Field default for the "model" attribute in the
config schema is hardcoded to "cross-encoder/ms-marco-MiniLM-L6-v2" which
duplicates the shared constant; update the default to reference the shared
constant DEFAULT_CROSS_ENCODER_MODEL from the constants module (import it at the
top if necessary) so the schema default and runtime default stay in sync, i.e.,
replace the literal string used in the Field default with
constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the import for
DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.
In `@src/utils/vector_search.py`:
- Around line 107-119: The docstring in the Solr vector_io query builder in
src/utils/vector_search.py contains two Returns sections; collapse them into a
single Google-style Returns section that describes the returned dict and its
keys (mode, filters, and any other params) and remove the duplicated/malformed
Returns block so the docstring follows Google conventions for the function that
builds query parameters for vector_io.query.
---
Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 568-573: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks; update the Args section of the
_fetch_solr_rag docstring to remove the max_chunks entry so it matches the
function signature (client, query, solr) and keep the existing descriptions for
client, query, and solr unchanged.
- Around line 248-282: The docstring for _query_store_for_byok_rag lists the
parameters in the wrong order (shows max_chunks before score_threshold) which
doesn't match the function signature; update the Args: section of the
_query_store_for_byok_rag docstring so score_threshold appears before max_chunks
and adjust their descriptions to match the signature and default
(constants.BYOK_RAG_MAX_CHUNKS).
---
Duplicate comments:
In `@docs/openapi.json`:
- Around line 1-19424: The openapi.json file is out of sync with the generated
schema and manual edits to it have caused CI failures. To fix this, regenerate
the openapi.json file from the source-of-truth schema using the provided
generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a791573e-b3ab-45c7-ba42-5465d7a60429
📒 Files selected for processing (11)
docs/byok_guide.mddocs/openapi.jsonpyproject.tomlsrc/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/utils/test_vector_search.py
💤 Files with no reviewable changes (3)
- tests/unit/test_configuration.py
- tests/unit/models/config/test_dump_configuration.py
- tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/configuration.pysrc/models/config.pysrc/utils/vector_search.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/**/configuration.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/configuration.py: All config models must extendConfigurationBasewithextra="forbid"to reject unknown fields
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Files:
src/configuration.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_byok_rag.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: OpenAPI schema is out of date. docs/openapi.json differs from the newly generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json
🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json
[error] 1-1: OpenAPI schema is out of date. Pipeline diff failed between 'docs/openapi.json' and generated '/tmp/openapi-generated.json'. Regenerate with: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'
🪛 markdownlint-cli2 (0.22.1)
docs/byok_guide.md
[warning] 92-92: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 93-93: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🔇 Additional comments (8)
src/utils/vector_search.py (2)
270-279: LGTM!
501-523: ⚡ Quick winEnsure
relevance_cutoff_mappinguses the same keyspace as the vector store lookup ID.
src/utils/vector_search.pylooks uprelevance_cutoff_mapping.get(vector_store_id, DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE)wherevector_store_idcomes fromvector_store_ids_to_query; confirmrelevance_cutoff_mappingis keyed by the samevector_db_idnamespace. If it’s keyed by user-facingrag_idinstead, the configured per-store cutoff won’t be applied and will always fall back to the default.tests/unit/models/config/test_byok_rag.py (1)
39-39: LGTM!Also applies to: 59-68, 194-226
src/constants.py (1)
3-3: LGTM!Also applies to: 161-163, 184-205, 295-300
src/models/config.py (1)
231-240: LGTM!Also applies to: 470-527, 617-625, 704-727, 1817-1823, 2176-2180, 2237-2247
src/configuration.py (1)
17-17: LGTM!Also applies to: 31-31, 463-489, 534-550
docs/byok_guide.md (2)
92-93: MD028 blockquote spacing issue still present.This is the same markdownlint issue already reported earlier: blank lines inside adjacent blockquote notes.
82-82: LGTM!Also applies to: 89-91, 94-98, 302-303, 309-311, 338-341, 591-591
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/vector_search.py (2)
568-573: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueStale
max_chunksentry in_fetch_solr_ragdocstring.The Args section documents a
max_chunksparameter (Lines 572-573) that does not exist in the signature(client, query, solr). Remove it to keep the docstring consistent with the function.📝 Proposed docstring fix
solr: Structured Solr inline RAG request from the API (optional). - max_chunks: Maximum number of chunks to return. If None, uses - constants.OKP_RAG_MAX_CHUNKS.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 568 - 573, The docstring for _fetch_solr_rag incorrectly documents a non-existent parameter max_chunks; update the Args section of the _fetch_solr_rag docstring to remove the max_chunks entry so it matches the function signature (client, query, solr) and keep the existing descriptions for client, query, and solr unchanged.
248-282:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix BYOK RAG docstring parameter order in
_query_store_for_byok_rag
- The docstring lists
max_chunksbeforescore_threshold, but the signature is(..., score_threshold: float, max_chunks: int = ...). Swap theArgs:entries to match.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 248 - 282, The docstring for _query_store_for_byok_rag lists the parameters in the wrong order (shows max_chunks before score_threshold) which doesn't match the function signature; update the Args: section of the _query_store_for_byok_rag docstring so score_threshold appears before max_chunks and adjust their descriptions to match the signature and default (constants.BYOK_RAG_MAX_CHUNKS).
♻️ Duplicate comments (1)
docs/openapi.json (1)
1-19424:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRegenerate
docs/openapi.jsonfrom source-of-truth instead of editing this artifact manually.This file is out of sync with the generated schema, and CI is currently failing on the spectral gate. Please regenerate and commit the generated output, otherwise this PR cannot merge.
🔧 Required fix
-# Manually edited OpenAPI artifact (drifted) +# Regenerated OpenAPI artifact from codeuv run scripts/generate_openapi_schema.py docs/openapi.json🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/openapi.json` around lines 1 - 19424, The openapi.json file is out of sync with the generated schema and manual edits to it have caused CI failures. To fix this, regenerate the openapi.json file from the source-of-truth schema using the provided generation script. Specifically, run the command `uv run scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date and consistent openapi.json artifact, then commit the regenerated file to the PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pyproject.toml`:
- Around line 229-231: Revert the repo-wide pylint relaxation in pyproject.toml
by removing or restoring the [tool.pylint.design] max-args and
max-positional-arguments overrides so the global "too-many-arguments" rules
remain strict, and instead add a targeted suppression or refactor for the BYOK
RAG helper: open src/utils/vector_search.py and either refactor async function
_query_store_for_byok_rag(...) to reduce its parameters or add a local "#
pylint: disable=too-many-arguments" (or "# pylint:
disable=too-many-positional-arguments") immediately above that function
declaration to allow the exception only there. Ensure no other global changes
are made so existing local suppressions (e.g., in response_generator,
_check_shield_moderation, prepare_tools, build_turn_summary,
create_transcript_metadata) remain intact.
In `@src/models/config.py`:
- Around line 2351-2387: The validator currently only checks self.rag.inline for
OKP (constants.OKP_RAG_ID) but the docstring/comments say inline OR tool should
trigger auto-enable; update the predicate by changing the has_okp computation in
validate_reranker_auto_enable to consider both self.rag.inline and self.rag.tool
(e.g., check constants.OKP_RAG_ID in either collection) and adjust the log
message/comment to say "inline or tool" as needed; keep the existing
protected-access and pylint disables around reranker._explicitly_configured
unchanged.
- Around line 2018-2023: The Field default for the "model" attribute in the
config schema is hardcoded to "cross-encoder/ms-marco-MiniLM-L6-v2" which
duplicates the shared constant; update the default to reference the shared
constant DEFAULT_CROSS_ENCODER_MODEL from the constants module (import it at the
top if necessary) so the schema default and runtime default stay in sync, i.e.,
replace the literal string used in the Field default with
constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the import for
DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.
In `@src/utils/vector_search.py`:
- Around line 107-119: The docstring in the Solr vector_io query builder in
src/utils/vector_search.py contains two Returns sections; collapse them into a
single Google-style Returns section that describes the returned dict and its
keys (mode, filters, and any other params) and remove the duplicated/malformed
Returns block so the docstring follows Google conventions for the function that
builds query parameters for vector_io.query.
---
Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 568-573: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks; update the Args section of the
_fetch_solr_rag docstring to remove the max_chunks entry so it matches the
function signature (client, query, solr) and keep the existing descriptions for
client, query, and solr unchanged.
- Around line 248-282: The docstring for _query_store_for_byok_rag lists the
parameters in the wrong order (shows max_chunks before score_threshold) which
doesn't match the function signature; update the Args: section of the
_query_store_for_byok_rag docstring so score_threshold appears before max_chunks
and adjust their descriptions to match the signature and default
(constants.BYOK_RAG_MAX_CHUNKS).
---
Duplicate comments:
In `@docs/openapi.json`:
- Around line 1-19424: The openapi.json file is out of sync with the generated
schema and manual edits to it have caused CI failures. To fix this, regenerate
the openapi.json file from the source-of-truth schema using the provided
generation script. Specifically, run the command `uv run
scripts/generate_openapi_schema.py docs/openapi.json` to produce an up-to-date
and consistent openapi.json artifact, then commit the regenerated file to the
PR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a791573e-b3ab-45c7-ba42-5465d7a60429
📒 Files selected for processing (11)
docs/byok_guide.mddocs/openapi.jsonpyproject.tomlsrc/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/utils/test_vector_search.py
💤 Files with no reviewable changes (3)
- tests/unit/test_configuration.py
- tests/unit/models/config/test_dump_configuration.py
- tests/unit/utils/test_vector_search.py
📜 Review details
🔇 Additional comments (8)
src/utils/vector_search.py (2)
270-279: LGTM!
501-523: ⚡ Quick winEnsure
relevance_cutoff_mappinguses the same keyspace as the vector store lookup ID.
src/utils/vector_search.pylooks uprelevance_cutoff_mapping.get(vector_store_id, DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE)wherevector_store_idcomes fromvector_store_ids_to_query; confirmrelevance_cutoff_mappingis keyed by the samevector_db_idnamespace. If it’s keyed by user-facingrag_idinstead, the configured per-store cutoff won’t be applied and will always fall back to the default.tests/unit/models/config/test_byok_rag.py (1)
39-39: LGTM!Also applies to: 59-68, 194-226
src/constants.py (1)
3-3: LGTM!Also applies to: 161-163, 184-205, 295-300
src/models/config.py (1)
231-240: LGTM!Also applies to: 470-527, 617-625, 704-727, 1817-1823, 2176-2180, 2237-2247
src/configuration.py (1)
17-17: LGTM!Also applies to: 31-31, 463-489, 534-550
docs/byok_guide.md (2)
92-93: MD028 blockquote spacing issue still present.This is the same markdownlint issue already reported earlier: blank lines inside adjacent blockquote notes.
82-82: LGTM!Also applies to: 89-91, 94-98, 302-303, 309-311, 338-341, 591-591
🛑 Comments failed to post (4)
pyproject.toml (1)
229-231:
⚠️ Potential issue | 🟠 Major | ⚡ Quick win🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail python - <<'PY' from pathlib import Path import ast def counts(node): args = node.args positional = len(args.posonlyargs) + len(args.args) total = positional + len(args.kwonlyargs) return total, positional for path in sorted(Path("src").rglob("*.py")): try: tree = ast.parse(path.read_text()) except Exception as exc: print(f"PARSE_ERROR {path}: {exc}") continue for node in ast.walk(tree): if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): total, positional = counts(node) if total > 5 or positional > 5: kind = "async def" if isinstance(node, ast.AsyncFunctionDef) else "def" print(f"{path}:{node.lineno}: {kind} {node.name} total_args={total} positional_args={positional}") PYRepository: lightspeed-core/lightspeed-stack
Length of output: 1585
🏁 Script executed:
#!/bin/bash set -euo pipefail # 1) Show the exact pylint design config in pyproject.toml around the flagged lines echo "=== pyproject.toml: tool.pylint.design context ===" python - <<'PY' from pathlib import Path p = Path("pyproject.toml") lines = p.read_text().splitlines() start, end = 200, 260 for i in range(start, min(end, len(lines))): # line numbers are 1-based print(f"{i+1}:{lines[i]}") PY # 2) Find BYOK RAG / relevance-cutoff related identifiers in src/ echo echo "=== Search: byok / rag / relevance cutoff identifiers ===" rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \ -S '(byok|byo?k|rag|relevance|cutoff|threshold)' src || true # 3) Count args in the function(s) likely touched by relevance-cutoff work (best-effort via grep of nearby matches) echo echo "=== Best-effort: show function defs around matches ===" rg -n --hidden --glob '!.venv/**' --glob '!.git/**' \ -S 'def |async def ' src/app src/utils src/models | head -n 200 || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 50388
🏁 Script executed:
#!/bin/bash set -euo pipefail python - <<'PY' from pathlib import Path import ast def arg_counts(fn: ast.AST): # pylint max-args / max-positional-arguments behavior is close to counting these. args = fn.args posonly = len(getattr(args, "posonlyargs", [])) positional = posonly + len(args.args) kwonly = len(args.kwonlyargs) # vararg / kwarg exist in ast.arguments vararg = 1 if args.vararg is not None else 0 kwarg = 1 if args.kwarg is not None else 0 total = positional + kwonly + vararg + kwarg return total, positional def scan(path: Path): txt = path.read_text(encoding="utf-8") tree = ast.parse(txt) out = [] for node in ast.walk(tree): if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): total, positional = arg_counts(node) out.append((str(path), node.lineno, node.name, isinstance(node, ast.AsyncFunctionDef), total, positional)) return out roots = [Path("src")] rows = [] parse_errors = [] for root in roots: for path in sorted(root.rglob("*.py")): try: rows.extend(scan(path)) except SyntaxError as e: parse_errors.append((str(path), e.lineno, e.msg)) except Exception as e: parse_errors.append((str(path), None, repr(e))) # Focus: what exceeds 5 (pylint default commonly) and what exceeds 8 (your new limit). exceed_5_total = [r for r in rows if r[4] > 5] exceed_8_total = [r for r in rows if r[4] > 8] exceed_5_pos = [r for r in rows if r[5] > 5] exceed_8_pos = [r for r in rows if r[5] > 8] print(f"functions total_params>5: {len(exceed_5_total)}") print(f"functions total_params>8: {len(exceed_8_total)}") print(f"functions positional_params>5: {len(exceed_5_pos)}") print(f"functions positional_params>8: {len(exceed_8_pos)}") # BYOK relevance-cutoff likely lives in vector_search.py vs_rows = [r for r in rows if r[0].endswith("src/utils/vector_search.py")] print(f"\nvector_search.py function count: {len(vs_rows)}") print("vector_search.py with total_params>5 (show):") for r in sorted([r for r in vs_rows if r[4] > 5], key=lambda x:(x[0],x[1])): print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}") print("\nTop 20 by total_params (to gauge breadth):") for r in sorted(rows, key=lambda x:x[4], reverse=True)[:20]: print(f"{r[0]}:{r[1]} {('async ' if r[3] else '')}{r[2]} total={r[4]} positional={r[5]}") print("\nParse errors (first 20):") for pe in parse_errors[:20]: print(pe) print(f"Parse errors: {len(parse_errors)}") PYRepository: lightspeed-core/lightspeed-stack
Length of output: 2206
🏁 Script executed:
#!/bin/bash set -euo pipefail # Show how pyproject.toml:229-231 changed in this PR/branch git diff -U3 -- pyproject.toml | sed -n '1,200p'Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash set -euo pipefail # Check whether we have parent commits to diff against git rev-parse HEAD~1 >/dev/null 2>&1 || { echo "NO_PARENT_COMMIT"; exit 0; } echo "=== Current pyproject.toml: tool.pylint.design ===" rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' pyproject.toml echo echo "=== Previous commit (HEAD~1): tool.pylint.design ===" git show HEAD~1:pyproject.toml | rg -n --context 2 '^\\[tool\\.pylint\\.design\\]|^max-args\\s*=|^max-positional-arguments\\s*=' || true echo echo "=== Previous commit: snippet around tool.pylint.design section (if present) ===" git show HEAD~1:pyproject.toml | python - <<'PY' import sys, re txt=sys.stdin.read().splitlines() for i,line in enumerate(txt): if line.strip()=="[tool.pylint.design]": for j in range(max(0,i-5), min(len(txt), i+12)): print(f"{j+1}:{txt[j]}") break else: print("No [tool.pylint.design] section in HEAD~1") PYRepository: lightspeed-core/lightspeed-stack
Length of output: 93
🏁 Script executed:
#!/bin/bash set -euo pipefail echo "=== local pylint disables for argument-count ===" rg -n 'pylint:\s*disable=.*too-many-(arguments|positional-arguments)' src || true echo echo "=== _query_store_for_byok_rag snippet ===" python - <<'PY' from pathlib import Path p=Path("src/utils/vector_search.py") lines=p.read_text().splitlines() for i,l in enumerate(lines): if "async def _query_store_for_byok_rag" in l: for j in range(max(0,i-20), min(len(lines), i+90)): print(f"{j+1}:{lines[j]}") break PY echo echo "=== functions with total args >5 and <=8 (approx mapping to pylint design limits) ===" python - <<'PY' from pathlib import Path import ast def arg_counts(fn: ast.AST): args=fn.args posonly=len(getattr(args,'posonlyargs',[])) positional=posonly+len(args.args) kwonly=len(args.kwonlyargs) vararg=1 if args.vararg else 0 kwarg=1 if args.kwarg else 0 total=positional+kwonly+vararg+kwarg return total,positional rows=[] for path in Path("src").rglob("*.py"): try: tree=ast.parse(path.read_text(encoding="utf-8")) except SyntaxError: continue for node in ast.walk(tree): if isinstance(node,(ast.FunctionDef,ast.AsyncFunctionDef)): total,positional=arg_counts(node) if total>5 and total<=8: kind="async def" if isinstance(node,ast.AsyncFunctionDef) else "def" rows.append((str(path),node.lineno,kind,node.name,total,positional)) rows=sorted(rows, key=lambda r:(r[0],r[1])) print(f"affected_count={len(rows)}") for r in rows: path,ln,kind,name,total,pos=r print(f"{path}:{ln} {kind} {name} total={total} positional={pos}") PYRepository: lightspeed-core/lightspeed-stack
Length of output: 6930
Avoid repo-wide pylint
max-argsrelaxation for BYOK RAG
pyproject.tomlsets[tool.pylint.design] max-args = 8andmax-positional-arguments = 8(lines 229-231), which relaxestoo-many-arguments/too-many-positional-argumentsacross the whole repo. The scan shows 9 functions insrc/with total args in the 6–8 range; onlysrc/utils/vector_search.py:248 async _query_store_for_byok_rag(...)is BYOK RAG-related, while others includeresponse_generator,_check_shield_moderation,prepare_tools,build_turn_summary, andcreate_transcript_metadata. Since other outliers are already handled via local# pylint: disable=too-many-argumentssuppressions, keep the stricter global limits and apply a targeted suppression/refactor specifically for_query_store_for_byok_rag.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 229 - 231, Revert the repo-wide pylint relaxation in pyproject.toml by removing or restoring the [tool.pylint.design] max-args and max-positional-arguments overrides so the global "too-many-arguments" rules remain strict, and instead add a targeted suppression or refactor for the BYOK RAG helper: open src/utils/vector_search.py and either refactor async function _query_store_for_byok_rag(...) to reduce its parameters or add a local "# pylint: disable=too-many-arguments" (or "# pylint: disable=too-many-positional-arguments") immediately above that function declaration to allow the exception only there. Ensure no other global changes are made so existing local suppressions (e.g., in response_generator, _check_shield_moderation, prepare_tools, build_turn_summary, create_transcript_metadata) remain intact.src/models/config.py (2)
2018-2023: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Use the shared reranker-model constant here.
This default duplicates
constants.DEFAULT_CROSS_ENCODER_MODEL, so the schema default and runtime default can drift on the next model change. Reference the shared constant instead of hardcoding the string.♻️ Proposed fix
model: str = Field( - default="cross-encoder/ms-marco-MiniLM-L6-v2", + default=constants.DEFAULT_CROSS_ENCODER_MODEL, title="Reranker model", description="Cross-encoder model name for reranking RAG chunks. " - "Defaults to 'cross-encoder/ms-marco-MiniLM-L6-v2' from sentence-transformers.", + f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.", )As per coding guidelines, "Check
constants.pyfor shared constants before defining new ones".📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.model: str = Field( default=constants.DEFAULT_CROSS_ENCODER_MODEL, title="Reranker model", description="Cross-encoder model name for reranking RAG chunks. " f"Defaults to '{constants.DEFAULT_CROSS_ENCODER_MODEL}' from sentence-transformers.", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 2018 - 2023, The Field default for the "model" attribute in the config schema is hardcoded to "cross-encoder/ms-marco-MiniLM-L6-v2" which duplicates the shared constant; update the default to reference the shared constant DEFAULT_CROSS_ENCODER_MODEL from the constants module (import it at the top if necessary) so the schema default and runtime default stay in sync, i.e., replace the literal string used in the Field default with constants.DEFAULT_CROSS_ENCODER_MODEL and ensure the import for DEFAULT_CROSS_ENCODER_MODEL is present where the Field is defined.
2351-2387:
⚠️ Potential issue | 🟠 Major | ⚡ Quick win
validate_reranker_auto_enableonly honors inline OKP.The validator’s docstring and comments say OKP in inline or tool strategies should trigger auto-enable, but Line 2368 only checks
self.rag.inline. A config with BYOK plusokponly inrag.toolwill skip this path. Either includeself.rag.toolin the predicate or narrow the documented contract to inline-only behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 2351 - 2387, The validator currently only checks self.rag.inline for OKP (constants.OKP_RAG_ID) but the docstring/comments say inline OR tool should trigger auto-enable; update the predicate by changing the has_okp computation in validate_reranker_auto_enable to consider both self.rag.inline and self.rag.tool (e.g., check constants.OKP_RAG_ID in either collection) and adjust the log message/comment to say "inline or tool" as needed; keep the existing protected-access and pylint disables around reranker._explicitly_configured unchanged.src/utils/vector_search.py (1)
107-119: 🧹 Nitpick | 🔵 Trivial | 💤 Low value
Duplicate/malformed
Returns:section in docstring.This docstring has two
Returns:blocks: the first (Lines 113-116) actually documentsparamskeys rather than the return value, and the second (Lines 118-119) is the real return description. Collapse into a single, correctReturns:section to follow Google docstring conventions.As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises".📝 Proposed docstring fix
Args: solr: Optional structured Solr request (mode and filters from the API). k: Optional number of results to return. If not provided, uses default. Returns: - - mode: Solr search mode (semantic, hybrid, lexical) - - filters: Solr filter payload, may contain structured metadata filters - - Returns: - Parameter dictionary for ``vector_io.query`` with extracted filters at top level. + Parameter dictionary for ``vector_io.query`` with extracted filters at top + level (keys include ``mode`` and, when present, ``filters``). """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."""Build query parameters for Solr vector_io search. Args: solr: Optional structured Solr request (mode and filters from the API). k: Optional number of results to return. If not provided, uses default. Returns: Parameter dictionary for ``vector_io.query`` with extracted filters at top level (keys include ``mode`` and, when present, ``filters``). """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 107 - 119, The docstring in the Solr vector_io query builder in src/utils/vector_search.py contains two Returns sections; collapse them into a single Google-style Returns section that describes the returned dict and its keys (mode, filters, and any other params) and remove the duplicated/malformed Returns block so the docstring follows Google conventions for the function that builds query parameters for vector_io.query.
Description
In the Lightspeed Stack configuration, byok_rag section, added a "relevance_cutoff_score" configuraiton variable, a positive float.
When a RAG chunk is retrieved from a BYOK database and its relevance score is less than the cutoff, the chunk is dropped from further consideration. Specifically, this cutoff filtering happens before score multipliers are applied.
Only added handling of relevance_cutoff_score to the inline BYOK RAG for now as adding it to the tool BYOK RAG is calling for changes in Llama Stack.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
API
Tests