🏠 House of Scraper 🚀

Your Swiss Army Knife for Web Scraping & Automation!

House of Scraper is a powerful Python-based toolkit designed to simplify web scraping, automation, and data extraction. Whether you're a developer, data scientist, or hobbyist, this repository provides ready-to-use scripts for scraping popular websites like Netflix, Linkedin, Tiktok, and more!

🌟 Features

✅ Multi-Platform Support: Scrape property listings, job postings, and product data from various websites.
✅ Easy-to-Use: Pre-built scripts with clear instructions—just run and extract!
✅ Customizable: Modify scripts to fit your specific scraping needs.
✅ Automation Ready: Integrate with workflows for scheduled scraping.
✅ Data Export: Save scraped data in semi structured formats (JSON).

⚡ Quick Start

Prerequisites

Python 3.8+
Libraries: requests, BeautifulSoup, pandas, cloudscraper

Installation

Clone the repo:

git clone https://github.com/haydarmiezanie/house_of_scraper.git
cd house_of_scraper

Install dependencies:
```
pip install -r requirements.txt
```
Run a Scraper as code (example for tokopedia):
```
py -m scraper --module "tokopedia.shop" --output json
```
Use this mode when you want to run the scraper as a standalone script. It executes the full scraping process with default settings, which is ideal for quick execution or command line integration.
Run a scraper as function (example for tokopedia):
```
py -m scraper --module "tokopedia.shop"
```
Use this mode when you prefer to integrate the scraping functionality into your own Python code. It allows you to call the scraper as a function, offering better flexibility for customization within larger applications.

🛠 Customization

Each script is modular and adjustable:

Modify URLs: Change the target website URL in the yaml.
Add Data Fields: Extract additional data by editing the parsing logic.

Example Custom

📂 Data Output

Scraped data is saved in /result as:

JSON (for APIs/databases)

Example Output

🤖 Ethical Scraping & Legal Note

⚠ Use responsibly!

Respect robots.txt and website terms.
Add delays (time.sleep()) to avoid overwhelming servers.

💬 Need Help?

Got questions or suggestions? Open an Issue or reach out:

📧 Email: haydarsaja@gmail.com
🐦 Linkedin: Haydar Miezanie Abdul Jamil

📜 License

How to Download This File:

Copy the entire text above.
Save it as README.md in your repository.
Commit and push!

Let me know if you'd like any modifications! 🎯

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cleanup_code		cleanup_code
cookies		cookies
data		data
headers		headers
helpers		helpers
how to		how to
params		params
payload		payload
platform		platform
result		result
transform_code		transform_code
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
conftest.py		conftest.py
requirements.txt		requirements.txt
scraper.py		scraper.py
test_scraper.py		test_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏠 House of Scraper 🚀

🌟 Features

⚡ Quick Start

Prerequisites

Installation

🛠 Customization

📂 Data Output

🤖 Ethical Scraping & Legal Note

💬 Need Help?

📜 License

How to Download This File:

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏠 House of Scraper 🚀

🌟 Features

⚡ Quick Start

Prerequisites

Installation

🛠 Customization

📂 Data Output

🤖 Ethical Scraping & Legal Note

💬 Need Help?

📜 License

How to Download This File:

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages