Skip to content

Commit 0341855

Browse files
committed
Add scripts and docs for source collection and scoring
0 parents  commit 0341855

473 files changed

Lines changed: 2744036 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.DS_Store
2+
.env
3+
venv
4+
data/collected-sources/claude/*
5+
DebateQA/

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# SourceBench
2+
3+
**URL & content collection**[`src/source-collection/`](src/source-collection/). Scripts to generate source URLs (e.g. via LLM or search) and scrape page content into the shared JSON format. See that folder’s [README](src/source-collection/README.md) for `get_urls.py`, `collect_sources_from_urls.py`, `collect_sources.py`, and setup.
4+
5+
**Scoring**[`src/content-scoring/scripts/`](src/content-scoring/scripts/). The main script is `scoring.py`: it scores source pages (e.g. with Qwen) and writes enriched JSON + CSV. See that folder’s [README](src/content-scoring/scripts/README.md) for usage and `--input-file` format.

data/collected-sources/deepseek-chat-gensee/DeepSeek_Response_Data_chat_gensee.json

Lines changed: 2828 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-chat-gensee/DeepSeek_Response_Data_chat_gensee_with_avg_ge_freq.json

Lines changed: 2828 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-chat-tavily/DeepSeek_Response_Data_chat_tavily.json

Lines changed: 2567 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-chat-tavily/DeepSeek_Response_Data_chat_tavily_with_avg_ge_freq.json

Lines changed: 2567 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-reasoning-gensee/DeepSeek_Response_Data_reasoner_gensee.json

Lines changed: 2838 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-reasoning-gensee/DeepSeek_Response_Data_reasoner_gensee_with_avg_ge_freq.json

Lines changed: 2838 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-reasoning-tavily/DeepSeek_Response_Data_reasoner_tavily.json

Lines changed: 2357 additions & 0 deletions
Large diffs are not rendered by default.

data/collected-sources/deepseek-reasoning-tavily/DeepSeek_Response_Data_reasoner_tavily_with_avg_ge_freq.json

Lines changed: 2357 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)