This repository provides a tool for scraping wikipedia for any topic and generating a knowledge graph from the scraped articles.
The new Neuralcoref from explosion.ai uses the state-of-the-art clustering
algorithm
MentionRank to cluster mentions in a document. This algorithm is much more accurate
than the previous one, but it is also much slower.
The new version of Neuralcoref is not compatible with the old version, so the code in this repository has been adapted
to the new version.
However, the end results are not the same as before. The new version of Neuralcoref delivers low-quality clusters, so the results are not as good as before. This is a known issue, and the developers are working on it.
- Python 3.7
- Wikipedia-API
- Spacy
- Neuralcoref
- Networkx
- spaCy en_core_web_lg
We recommend using conda. Create a new environment from the environment.yml file
in the root of this repository:
conda env create -f environment.ymlThen, activate the environment:
conda activate spacy_pos_kgAlternatively, you can use virtualenv. Create a new environment from the
requirements.txt file in the root of this repository:
virtualenv -p python3.7 venv
source venv/bin/activate
pip install -r requirements.txtBelow is an example of how to run the code. The code will scrape wikipedia for the text query 2008 recession, generate
a knowledge graph and plot it.
python demo.py --target "2008 recession" --sub-graph-target "The federal reserve"Output:
The graph was generated using the
en_core_web_lgmodel from spaCy and plotted with networkx and matplotlib.
The sub-graph was generated for the entity "The federal reserve"

