Ranking documents of a query using BM25 Score in Document Ranking Phase and Rocchio Algorithm in Query Expansion Phase.
-
Create a folder name
dataand put query txt and doc txt in./datafolder -
Run
EE448.ipynbto visual output -
The output ranked documents is in
./data/bm25_score.txt
Python >= 3.0
You can get dataset here or use you own data.
-
./data/query.txt:query_id \t query_text -
./data/doc.txt:document_id \t document_text
-
Set expansion words in
util.py/findNewQuery/loopRangeto different value. If the documents is short, set loopRange to a smaller value. -
Set
k2inscore.py/bm25to larger value. -
Set
GAMMAto 0.15 or 0 to enable positive feedback and negative feedback -
You may try different Score Function like TF-IDF to rank documents in
score.py
