Skip to content

fix: include grounding metadata in rubric judge prompt#5834

Open
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/google-search-rubric-evidence
Open

fix: include grounding metadata in rubric judge prompt#5834
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/google-search-rubric-evidence

Conversation

@he-yufeng
Copy link
Copy Markdown

Summary

This updates the rubric-based final response quality evaluator so model-supplied grounding metadata is available to the LLM-as-judge prompt.

The issue is easiest to hit with model-internal tools such as google_search: the evaluator currently tells the judge to trust only function tool_response values, but those raw search results may not appear as normal function tool responses. ADK events can still carry grounding metadata, so this patch preserves that metadata in eval invocation events and serializes it into the judge prompt as trusted evidence.

Final answer text is still not treated as evidence.

Fixes #5831.

To verify

  • python -m py_compile src/google/adk/evaluation/eval_case.py src/google/adk/evaluation/evaluation_generator.py src/google/adk/evaluation/llm_as_judge_utils.py src/google/adk/evaluation/rubric_based_final_response_quality_v1.py tests/unittests/evaluation/test_evaluation_generator.py tests/unittests/evaluation/test_llm_as_judge_utils.py tests/unittests/evaluation/test_rubric_based_final_response_quality_v1.py
  • .venv\Scripts\python.exe -m pyink --check src\google\adk\evaluation\eval_case.py src\google\adk\evaluation\evaluation_generator.py src\google\adk\evaluation\llm_as_judge_utils.py src\google\adk\evaluation\rubric_based_final_response_quality_v1.py tests\unittests\evaluation\test_evaluation_generator.py tests\unittests\evaluation\test_llm_as_judge_utils.py tests\unittests\evaluation\test_rubric_based_final_response_quality_v1.py
  • .venv\Scripts\python.exe -m isort --check-only src\google\adk\evaluation\eval_case.py src\google\adk\evaluation\evaluation_generator.py src\google\adk\evaluation\llm_as_judge_utils.py src\google\adk\evaluation\rubric_based_final_response_quality_v1.py tests\unittests\evaluation\test_evaluation_generator.py tests\unittests\evaluation\test_llm_as_judge_utils.py tests\unittests\evaluation\test_rubric_based_final_response_quality_v1.py
  • .venv\Scripts\python.exe -m pytest tests\unittests\evaluation\test_eval_case.py tests\unittests\evaluation\test_llm_as_judge_utils.py tests\unittests\evaluation\test_rubric_based_final_response_quality_v1.py tests\unittests\evaluation\test_evaluation_generator.py -q
  • git diff --check

I also ran targeted pylint on the touched files. It still reports existing module-wide style warnings in these evaluation tests/modules, but no unused-import or grounding-metadata-specific issue remains.

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

2 participants