Refactor: Rename CodeEvaluator to SystemEvaluator#110
Open
gigistark-google wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
Open
Refactor: Rename CodeEvaluator to SystemEvaluator#110gigistark-google wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
gigistark-google wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
Conversation
Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.
a8fc1a0 to
d079d31
Compare
ed76ad1 to
bd557c6
Compare
23391df to
d48e249
Compare
…Evaluator, but don't add new metrics to the MultiTrialPerformance Evaluator yet Purged obsolete criteria-list LLMAsJudge implementations, replacing them natively with PerformanceEvaluator for folded Tone, Faithfulness, Correctness, and Efficiency evaluations. - Decoupled system and performance modules cleanly, making system_evaluator.py pure to SystemEvaluator. - Overrode the backwards-compatible LLMAsJudge subclass in evaluators.py with required static factories for correctness, hallucination, and sentiment. - PURGED criteria-list BQML execution code from client.py, and deleted legacy _criteria and _JudgeCriterion list validations throughout test suites. - Fixed Jupyter event-loop context constraints via robust asyncio running event-loop setters inside Client._evaluate_performance. - Refactored strip_markdown_fences in utils.py to drop trailing prose after fenced markdown closing backticks cleanly. - Verified 1,997 collected unit tests PASSING 100% green successfully. TAG=agy CONV=bf5607ce-a7fc-4a29-a7fb-c6074580e613
d48e249 to
d80d5f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics.
Some of the files import and export both CodeEvaluator and SystemEvaluator to maintain strict backward compatibility.
Once the team has successfully migrated all internal pipelines to SystemEvaluator, we can safely delete the CodeEvaluator alias and double-imports in a future major version release.