Skip to content

Refactor: Rename CodeEvaluator to SystemEvaluator#110

Open
gigistark-google wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
gigistark-google:gigistark-google
Open

Refactor: Rename CodeEvaluator to SystemEvaluator#110
gigistark-google wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
gigistark-google:gigistark-google

Conversation

@gigistark-google
Copy link
Copy Markdown

Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics.

Some of the files import and export both CodeEvaluator and SystemEvaluator to maintain strict backward compatibility.

  • The Alias (evaluators.py): We defined CodeEvaluator = SystemEvaluator inside evaluators.py.
  • The Imports (init.py, client.py, cli.py, grader_pipeline.py): By importing and re-exporting both CodeEvaluator and SystemEvaluator, we ensure:
    • Existing user scripts and pipelines (e.g., notebooks or custom scripts written by other team members) that currently import CodeEvaluator from the SDK will continue working seamlessly without any import errors.
    • New code and updated documentation can immediately transition to importing and instantiating SystemEvaluator.

Once the team has successfully migrated all internal pipelines to SystemEvaluator, we can safely delete the CodeEvaluator alias and double-imports in a future major version release.

Rename CodeEvaluator to SystemEvaluator to align with its focus on system-level metrics. A CodeEvaluator alias is kept in evaluators.py for backward-compatibility.
@gigistark-google gigistark-google force-pushed the gigistark-google branch 22 times, most recently from a8fc1a0 to d079d31 Compare May 3, 2026 17:46
@gigistark-google gigistark-google force-pushed the gigistark-google branch 6 times, most recently from ed76ad1 to bd557c6 Compare May 3, 2026 18:39
@gigistark-google gigistark-google force-pushed the gigistark-google branch 16 times, most recently from 23391df to d48e249 Compare May 4, 2026 02:06
…Evaluator, but don't add new metrics to the MultiTrialPerformance Evaluator yet

Purged obsolete criteria-list LLMAsJudge implementations, replacing them natively with PerformanceEvaluator for folded Tone, Faithfulness, Correctness, and Efficiency evaluations.

- Decoupled system and performance modules cleanly, making system_evaluator.py pure to SystemEvaluator.
- Overrode the backwards-compatible LLMAsJudge subclass in evaluators.py with required static factories for correctness, hallucination, and sentiment.
- PURGED criteria-list BQML execution code from client.py, and deleted legacy _criteria and _JudgeCriterion list validations throughout test suites.
- Fixed Jupyter event-loop context constraints via robust asyncio running event-loop setters inside Client._evaluate_performance.
- Refactored strip_markdown_fences in utils.py to drop trailing prose after fenced markdown closing backticks cleanly.
- Verified 1,997 collected unit tests PASSING 100% green successfully.

TAG=agy
CONV=bf5607ce-a7fc-4a29-a7fb-c6074580e613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant