Your Swiss Army Knife for Web Scraping & Automation!
House of Scraper is a powerful Python-based toolkit designed to simplify web scraping, automation, and data extraction. Whether you're a developer, data scientist, or hobbyist, this repository provides ready-to-use scripts for scraping popular websites like Netflix, Linkedin, Tiktok, and more!
✅ Multi-Platform Support: Scrape property listings, job postings, and product data from various websites.
✅ Easy-to-Use: Pre-built scripts with clear instructions—just run and extract!
✅ Customizable: Modify scripts to fit your specific scraping needs.
✅ Automation Ready: Integrate with workflows for scheduled scraping.
✅ Data Export: Save scraped data in semi structured formats (JSON).
- Python 3.8+
- Libraries:
requests,BeautifulSoup,pandas,cloudscraper
-
Clone the repo:
git clone https://github.com/haydarmiezanie/house_of_scraper.git cd house_of_scraper -
Install dependencies:
pip install -r requirements.txt
-
Run a Scraper as code (example for tokopedia):
py -m scraper --module "tokopedia.shop" --output jsonUse this mode when you want to run the scraper as a standalone script. It executes the full scraping process with default settings, which is ideal for quick execution or command line integration.
-
Run a scraper as function (example for tokopedia):
py -m scraper --module "tokopedia.shop"Use this mode when you prefer to integrate the scraping functionality into your own Python code. It allows you to call the scraper as a function, offering better flexibility for customization within larger applications.
Each script is modular and adjustable:
- Modify URLs: Change the target website URL in the yaml.
- Add Data Fields: Extract additional data by editing the parsing logic.
Scraped data is saved in /result as:
- JSON (for APIs/databases)
⚠ Use responsibly!
- Respect robots.txt and website terms.
- Add delays (time.sleep()) to avoid overwhelming servers.
Got questions or suggestions? Open an Issue or reach out:
- 📧 Email: haydarsaja@gmail.com
- 🐦 Linkedin: Haydar Miezanie Abdul Jamil
MIT © Haydar Miezanie Abdul Jamil – Free for personal and commercial use.
- Copy the entire text above.
- Save it as
README.mdin your repository. - Commit and push!
Let me know if you'd like any modifications! 🎯