Commit 329ff4b
committed
Unify One-Sided & Side-by-Side Performance Metrics in the PerformanceEvaluator, but don't add new metrics to the MultiTrialPerformance Evaluator yet
Purged obsolete criteria-list LLMAsJudge implementations, replacing them natively with PerformanceEvaluator for folded Tone, Faithfulness, Correctness, and Efficiency evaluations.
- Decoupled system and performance modules cleanly, making system_evaluator.py pure to SystemEvaluator.
- Overrode the backwards-compatible LLMAsJudge subclass in evaluators.py with required static factories for correctness, hallucination, and sentiment.
- PURGED criteria-list BQML execution code from client.py, and deleted legacy _criteria and _JudgeCriterion list validations throughout test suites.
- Fixed Jupyter event-loop context constraints via robust asyncio running event-loop setters inside Client._evaluate_performance.
- Refactored strip_markdown_fences in utils.py to drop trailing prose after fenced markdown closing backticks cleanly.
- Verified 1,997 collected unit tests PASSING 100% green successfully.
TAG=agy
CONV=bf5607ce-a7fc-4a29-a7fb-c6074580e6131 parent 7982750 commit 329ff4b
12 files changed
Lines changed: 497 additions & 4177 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
69 | | - | |
| 68 | + | |
| 69 | + | |
70 | 70 | | |
71 | | - | |
72 | | - | |
| 71 | + | |
| 72 | + | |
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
| 132 | + | |
131 | 133 | | |
132 | 134 | | |
133 | 135 | | |
| |||
190 | 192 | | |
191 | 193 | | |
192 | 194 | | |
193 | | - | |
194 | | - | |
195 | | - | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
196 | 199 | | |
197 | 200 | | |
198 | 201 | | |
199 | 202 | | |
| 203 | + | |
200 | 204 | | |
201 | 205 | | |
202 | 206 | | |
| |||
210 | 214 | | |
211 | 215 | | |
212 | 216 | | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
220 | 225 | | |
221 | 226 | | |
222 | 227 | | |
223 | 228 | | |
224 | 229 | | |
| 230 | + | |
225 | 231 | | |
226 | 232 | | |
227 | 233 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
25 | | - | |
| 23 | + | |
| 24 | + | |
26 | 25 | | |
27 | 26 | | |
28 | | - | |
| 27 | + | |
29 | 28 | | |
30 | 29 | | |
31 | 30 | | |
| |||
247 | 246 | | |
248 | 247 | | |
249 | 248 | | |
250 | | - | |
| 249 | + | |
251 | 250 | | |
252 | 251 | | |
253 | 252 | | |
| |||
256 | 255 | | |
257 | 256 | | |
258 | 257 | | |
259 | | - | |
| 258 | + | |
260 | 259 | | |
261 | 260 | | |
262 | 261 | | |
| |||
268 | 267 | | |
269 | 268 | | |
270 | 269 | | |
271 | | - | |
| 270 | + | |
272 | 271 | | |
273 | 272 | | |
274 | 273 | | |
| |||
280 | 279 | | |
281 | 280 | | |
282 | 281 | | |
283 | | - | |
| 282 | + | |
284 | 283 | | |
285 | 284 | | |
286 | 285 | | |
| |||
304 | 303 | | |
305 | 304 | | |
306 | 305 | | |
307 | | - | |
| 306 | + | |
308 | 307 | | |
309 | 308 | | |
310 | 309 | | |
| |||
329 | 328 | | |
330 | 329 | | |
331 | 330 | | |
332 | | - | |
| 331 | + | |
333 | 332 | | |
334 | 333 | | |
335 | 334 | | |
| |||
428 | 427 | | |
429 | 428 | | |
430 | 429 | | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
| |||
907 | 908 | | |
908 | 909 | | |
909 | 910 | | |
910 | | - | |
911 | | - | |
| 911 | + | |
| 912 | + | |
912 | 913 | | |
913 | 914 | | |
914 | 915 | | |
915 | 916 | | |
916 | | - | |
917 | 917 | | |
918 | | - | |
919 | | - | |
920 | | - | |
921 | 918 | | |
922 | 919 | | |
923 | 920 | | |
| |||
954 | 951 | | |
955 | 952 | | |
956 | 953 | | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
957 | 1024 | | |
958 | 1025 | | |
959 | 1026 | | |
| |||
0 commit comments