Commit d80d5f8
committed
Unify One-Sided & Side-by-Side Performance Metrics in the PerformanceEvaluator, but don't add new metrics to the MultiTrialPerformance Evaluator yet
Purged obsolete criteria-list LLMAsJudge implementations, replacing them natively with PerformanceEvaluator for folded Tone, Faithfulness, Correctness, and Efficiency evaluations.
- Decoupled system and performance modules cleanly, making system_evaluator.py pure to SystemEvaluator.
- Overrode the backwards-compatible LLMAsJudge subclass in evaluators.py with required static factories for correctness, hallucination, and sentiment.
- PURGED criteria-list BQML execution code from client.py, and deleted legacy _criteria and _JudgeCriterion list validations throughout test suites.
- Fixed Jupyter event-loop context constraints via robust asyncio running event-loop setters inside Client._evaluate_performance.
- Refactored strip_markdown_fences in utils.py to drop trailing prose after fenced markdown closing backticks cleanly.
- Verified 1,997 collected unit tests PASSING 100% green successfully.
TAG=agy
CONV=bf5607ce-a7fc-4a29-a7fb-c6074580e6131 parent 3468e9c commit d80d5f8
29 files changed
Lines changed: 2417 additions & 6067 deletions
File tree
- docs
- examples
- src/bigquery_agent_analytics
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
32 | 31 | | |
33 | 32 | | |
34 | 33 | | |
| |||
123 | 122 | | |
124 | 123 | | |
125 | 124 | | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
130 | 129 | | |
131 | 130 | | |
132 | 131 | | |
| |||
0 commit comments