Skip to content

Commit 3bbeac0

Browse files
committed
Clean leaderboard paths
1 parent bc809f9 commit 3bbeac0

4 files changed

Lines changed: 46 additions & 46 deletions

File tree

leaderboard/OFFICIAL_SUBMISSION_CONTRACT.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -234,20 +234,20 @@ Answer + cited URL submission:
234234

235235
Validation CLI now exists at:
236236

237-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/validate_official_submission.py`
237+
- `src/evaluation/validate_official_submission.py`
238238

239239
Minimal intake backend now exists at:
240240

241-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/official_submission_backend.py`
241+
- `src/evaluation/official_submission_backend.py`
242242

243243
Minimal official runner now exists at:
244244

245-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/official_run.py`
245+
- `src/evaluation/official_run.py`
246246

247247
Example submission templates now exist at:
248248

249-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/leaderboard/examples/endpoint_submission.example.json`
250-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/leaderboard/examples/answer_url_bundle.example.json`
249+
- `leaderboard/examples/endpoint_submission.example.json`
250+
- `leaderboard/examples/answer_url_bundle.example.json`
251251

252252
Example:
253253

leaderboard/QUERY_SPLIT_POLICY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Not publicly disclosed:
5353

5454
Public split:
5555

56-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/queries/sourcebench_public_queries_v1.csv`
56+
- `data/queries/sourcebench_public_queries_v1.csv`
5757

5858
Holdout split:
5959

leaderboard/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -130,31 +130,31 @@ Not sufficient for official ranking:
130130
### Option B: serve the folder locally
131131

132132
```bash
133-
cd /Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/leaderboard
133+
cd leaderboard
134134
python3 -m http.server 8000
135135
```
136136

137137
Then open:
138138

139-
- [http://localhost:8000](http://localhost:8000)
139+
- the local URL printed by `python3 -m http.server`
140140

141141
The page will try to auto-load `leaderboard_data.json` from the same folder.
142142

143143
## Backend scripts currently used
144144

145145
- source collection:
146-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/source-collection/get_urls.py`
147-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/source-collection/collect_sources_from_urls.py`
146+
- `src/source-collection/get_urls.py`
147+
- `src/source-collection/collect_sources_from_urls.py`
148148
- judging:
149-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/content-scoring/scripts/scoring.py`
149+
- `src/content-scoring/scripts/scoring.py`
150150
- metrics:
151-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/compute_metrics.py`
151+
- `src/evaluation/compute_metrics.py`
152152
- official submission validation:
153-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/validate_official_submission.py`
153+
- `src/evaluation/validate_official_submission.py`
154154
- official submission intake backend:
155-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/official_submission_backend.py`
155+
- `src/evaluation/official_submission_backend.py`
156156
- official evaluation runner:
157-
- `/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/src/evaluation/official_run.py`
157+
- `src/evaluation/official_run.py`
158158

159159
## `compute_metrics.py`
160160

leaderboard/leaderboard_data.json

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -27,83 +27,83 @@
2727
"runs": [
2828
{
2929
"model_name": "Gemini-2.5-Flash-Preview",
30-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gemini-2.5-flash/qwen_scoring_for_gemini_2.5_flash_few_shot.json",
30+
"score_file": "data/content-scores/gemini-2.5-flash/qwen_scoring_for_gemini_2.5_flash_few_shot.json",
3131
"rank_file": null
3232
},
3333
{
3434
"model_name": "Gemini-3-Flash-Preview",
35-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gemini-3-flash-preview/qwen_scoring_for_gemini-3-flash-preview_fewshot.json",
36-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/gemini-3-flash-preview/geo_scores.json"
35+
"score_file": "data/content-scores/gemini-3-flash-preview/qwen_scoring_for_gemini-3-flash-preview_fewshot.json",
36+
"rank_file": "data/rank-scores/gemini-3-flash-preview/geo_scores.json"
3737
},
3838
{
3939
"model_name": "Gemini-3-Pro-Preview",
40-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gemini-3-pro-preview/qwen_scoring_for_gemini-3-pro-preview_fewshot.json",
41-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/gemini-3-pro-preview/geo_scores.json"
40+
"score_file": "data/content-scores/gemini-3-pro-preview/qwen_scoring_for_gemini-3-pro-preview_fewshot.json",
41+
"rank_file": "data/rank-scores/gemini-3-pro-preview/geo_scores.json"
4242
},
4343
{
4444
"model_name": "Perplexity-Sonar-Pro",
45-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/perplexity/qwen_scoring_for_perplexity_few_shot.json",
46-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/perplexity/geo_scores.json"
45+
"score_file": "data/content-scores/perplexity/qwen_scoring_for_perplexity_few_shot.json",
46+
"rank_file": "data/rank-scores/perplexity/geo_scores.json"
4747
},
4848
{
4949
"model_name": "claude",
50-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/claude/qwen_scoring_for_claude-sonnet-4.5_fewshot.json",
51-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/claude/geo_scores.json"
50+
"score_file": "data/content-scores/claude/qwen_scoring_for_claude-sonnet-4.5_fewshot.json",
51+
"rank_file": "data/rank-scores/claude/geo_scores.json"
5252
},
5353
{
5454
"model_name": "deepseek-chat-gensee",
55-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/deepseek-chat-gensee/qwen_scoring_deepseek_chat_gensee.json",
56-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/deepseek-chat-gensee/geo_scores.json"
55+
"score_file": "data/content-scores/deepseek-chat-gensee/qwen_scoring_deepseek_chat_gensee.json",
56+
"rank_file": "data/rank-scores/deepseek-chat-gensee/geo_scores.json"
5757
},
5858
{
5959
"model_name": "deepseek-chat-tavily",
60-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/deepseek-chat-tavily/qwen_scoring_deepseek_chat_tavily.json",
61-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/deepseek-chat-tavily/geo_scores.json"
60+
"score_file": "data/content-scores/deepseek-chat-tavily/qwen_scoring_deepseek_chat_tavily.json",
61+
"rank_file": "data/rank-scores/deepseek-chat-tavily/geo_scores.json"
6262
},
6363
{
6464
"model_name": "deepseek-reasoning-gensee",
65-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/deepseek-reasoning-gensee/qwen_scoring_deepseek_reasoner_gensee.json",
66-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/deepseek-reasoning-gensee/geo_scores.json"
65+
"score_file": "data/content-scores/deepseek-reasoning-gensee/qwen_scoring_deepseek_reasoner_gensee.json",
66+
"rank_file": "data/rank-scores/deepseek-reasoning-gensee/geo_scores.json"
6767
},
6868
{
6969
"model_name": "deepseek-reasoning-tavily",
70-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/deepseek-reasoning-tavily/qwen_scoring_deepseek_reasoner_tavily.json",
71-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/deepseek-reasoning-tavily/geo_scores.json"
70+
"score_file": "data/content-scores/deepseek-reasoning-tavily/qwen_scoring_deepseek_reasoner_tavily.json",
71+
"rank_file": "data/rank-scores/deepseek-reasoning-tavily/geo_scores.json"
7272
},
7373
{
7474
"model_name": "exa",
75-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/exa/qwen_scoring_for_exa_few_shot.json",
76-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/exa/geo_scores.json"
75+
"score_file": "data/content-scores/exa/qwen_scoring_for_exa_few_shot.json",
76+
"rank_file": "data/rank-scores/exa/geo_scores.json"
7777
},
7878
{
7979
"model_name": "gensee",
80-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gensee/qwen_scoring_for_gensee_few_shot.json",
81-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/gensee/geo_scores.json"
80+
"score_file": "data/content-scores/gensee/qwen_scoring_for_gensee_few_shot.json",
81+
"rank_file": "data/rank-scores/gensee/geo_scores.json"
8282
},
8383
{
8484
"model_name": "google-search",
85-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/google-search/qwen_scoring_for_search_engine.json",
85+
"score_file": "data/content-scores/google-search/qwen_scoring_for_search_engine.json",
8686
"rank_file": null
8787
},
8888
{
8989
"model_name": "gpt-4o",
90-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gpt-4o/qwen_scoring_for_gpt4o_few_shot.json",
91-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/gpt-4o/geo_scores.json"
90+
"score_file": "data/content-scores/gpt-4o/qwen_scoring_for_gpt4o_few_shot.json",
91+
"rank_file": "data/rank-scores/gpt-4o/geo_scores.json"
9292
},
9393
{
9494
"model_name": "gpt-5",
95-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/gpt-5/qwen_scoring_for_gpt-5_fewshot.json",
96-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/gpt-5/geo_scores.json"
95+
"score_file": "data/content-scores/gpt-5/qwen_scoring_for_gpt-5_fewshot.json",
96+
"rank_file": "data/rank-scores/gpt-5/geo_scores.json"
9797
},
9898
{
9999
"model_name": "grok-4.1-fast-non-reasoning",
100-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/grok/qwen_scoring_for_grok_4.1_fast_non_reasoning_few_shot.json",
101-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/grok/geo_scores.json"
100+
"score_file": "data/content-scores/grok/qwen_scoring_for_grok_4.1_fast_non_reasoning_few_shot.json",
101+
"rank_file": "data/rank-scores/grok/geo_scores.json"
102102
},
103103
{
104104
"model_name": "tavily",
105-
"score_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/content-scores/tavily/qwen_scoring_for_tavily_few_shot.json",
106-
"rank_file": "/Users/kristinx351/Documents/UCSD/Courses_material/Q5/AdsInGenAI/Code/trust_evaluator/data/rank-scores/tavily/geo_scores.json"
105+
"score_file": "data/content-scores/tavily/qwen_scoring_for_tavily_few_shot.json",
106+
"rank_file": "data/rank-scores/tavily/geo_scores.json"
107107
}
108108
]
109109
},
@@ -28727,4 +28727,4 @@
2872728727
"percentage_ge_sources_in_se_sources": 100.0
2872828728
}
2872928729
]
28730-
}
28730+
}

0 commit comments

Comments
 (0)