BooksToScrape — Python Web Scraper

A small, easy-to-understand web scraper written in Python 3 that crawls the BooksToScrape demo site and extracts book data (title, price, availability, description, rating, image URL and category). It also downloads book cover images and saves the collected data to a CSV file.

This repository is intended as a learning example for web scraping with Requests and BeautifulSoup.

Features

Scrapes all categories on BooksToScrape
Automatically follows pagination
Extracts details for every book:
- Title
- Price
- Availability
- Product description
- Star rating
- Image URL
- Category
Downloads book cover images into category-based folders
Saves all data to a CSV file
Basic error handling and polite request delays

Requirements

Python 3.7+ (3.x)
requests
beautifulsoup4

Install the dependencies:

pip install requests beautifulsoup4

(Optionally, create a virtual environment before installing.)

Installation

Clone the repository:

git clone https://github.com/Asiwaju24/Scraping.git
cd Scraping

Install the required packages:

pip install requests beautifulsoup4

Usage

Run the scraper:

python scrape.py

After the script completes:

A CSV file named books_full_scrape.csv will be created in the project root.
All book cover images will be saved inside an images/ folder, grouped by category (e.g. images/Travel/).

If you want, run the script inside a virtual environment to avoid impacting your system Python packages.

Output

CSV file: books_full_scrape.csv
Example CSV columns (header row):

title, price, availability, description, star_rating, image_url, category

Images folder: images/<Category>/<image-files>

Project structure

├── scrape.py
├── books_full_scrape.csv    # Generated after running the script
├── images/
│   ├── Travel/
│   ├── Mystery/
│   ├── Fiction/
│   └── ...
└── README.md

Notes on ethics & legality

The target site, https://books.toscrape.com, is intentionally provided for scraping practice. Respect robots.txt and site owners when scraping real sites. Limit request rate and avoid excessive parallel requests that could cause problems for servers.

Error handling & rate limiting

The script includes simple error handling and request delays to be polite to the server. If you extend the scraper or apply it to other sites, add more robust retry logic, exponential backoff, and careful handling of network errors or site structure changes.

Contributing

Contributions and improvements are welcome. Suggested improvements you could add:

Add a requirements.txt or pyproject.toml
Add CLI flags (output filename, delay, categories to limit to)
Add logging and more robust retry/backoff logic
Add unit tests for parsing functions

If you make changes, please open a pull request with a clear description of what changed and why.

License

This project is provided for learning and demonstration purposes. You may reuse or adapt the code for non-commercial or educational uses. Add a formal license (e.g., MIT) if you want to publish this repository for wider reuse.

Target website

BooksToScrape: https://books.toscrape.com — a site provided specifically for practicing and testing scraping techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Scraper.py		Scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BooksToScrape — Python Web Scraper

Table of contents

Features

Requirements

Installation

Usage

Output

Project structure

Notes on ethics & legality

Error handling & rate limiting

Contributing

License

Target website

About

Uh oh!

Releases

Packages

Languages

Asiwaju24/Scraping

Folders and files

Latest commit

History

Repository files navigation

BooksToScrape — Python Web Scraper

Table of contents

Features

Requirements

Installation

Usage

Output

Project structure

Notes on ethics & legality

Error handling & rate limiting

Contributing

License

Target website

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages