A production-ready boilerplate to collect publicly available Instagram web data (profiles, posts, hashtags) using safe automation patterns, rotating proxies, and human-like delays. Built for agencies, researchers, and growth teams that want reliable scraping with lower block risk.
For discussion, queries, and freelance work — reach out 👆
This repository provides a modular Instagram web scraping starter that focuses on resilience (anti-detect flows, rotating proxies, session reuse) and clarity (typed schema, storage adapters). It’s ideal for analysts, SaaS builders, and agencies that need compliant, rate-aware scraping of public pages.
- Saves time with prebuilt Playwright/Selenium runners.
- Scales from single run to distributed jobs.
- Safer with proxy rotation, backoff, fingerprint & session logic.
| Feature | Details |
|---|---|
| Headless/Visible Browsers | Playwright or Selenium drivers with toggleable headless mode |
| Proxy Rotation | Supports residential/mobile proxies with per-request rotation |
| Session Persistence | Reuse cookies/storage to reduce challenges and CAPTCHAs |
| Human-like Throttling | Randomized delays, jitter, scrolling, and viewport variance |
| Target Modules | Profile, posts, hashtag pages (public data) with parsers |
| Output Formats | JSONL, CSV, SQLite/Postgres adapters |
| Error/Retry Logic | Exponential backoff, soft-fail queues, resumable runs |
| CLI Runner | scrape profiles, scrape hashtag, resume subcommands |
| Dockerized | Reproducible runs with one-line Docker start |
| Env-First Config | .env for proxies, rate limits, storage, headless flags |
- Competitive research and trend tracking
- Social listening for public hashtags
- Creator discovery & lead lists (public info)
- Academic/market research on public engagement
Q: How to remove scraping warning?
A: Scraping warnings (blocks/challenges) often result from aggressive request rates, reused fingerprints, or IP reputation. Reduce concurrency, add randomized delays, persist sessions, rotate high-quality residential/mobile proxies, and lower fetch depth. Clearing cookies blindly can worsen flags—prefer stable sessions per account/profile, rotate user-agents with consistent device signatures, and implement exponential backoff on 4xx/429 responses.
Q: Does Instagram allow web scraping?
A: Accessing or collecting data is governed by Instagram’s Terms and your local laws. This boilerplate is for educational and compliance-oriented uses on publicly available pages. Always review and follow the platform’s terms and applicable regulations before running any scraper.
Q: Can web scraping be detected?
A: Yes. Platforms detect patterns like high request rates, identical fingerprints, datacenter IPs, and scripted navigation. Mitigate via residential/mobile proxies, realistic browser automation (Playwright/Selenium), randomized timings, scroll/viewport simulation, and consistent sessions. Even with safeguards, detection risk can’t be eliminated—only reduced.
10x faster posting schedules
80% engagement increase on group campaigns
Fully automated lead response system
Average Performance Benchmarks:
- Speed: 2x faster than manual posting
- Stability: 99.2% uptime
- Ban Rate: <0.5% with safe automation mode
- Throughput: 100+ posts/hour per session
##Do you have a customize project for us ? Contact Us
- Node.js or Python
- Git
- Docker (optional)
# Clone the repo
git clone https://github.com/yourusername/instagram-web-scraper.git
cd instagram-web-scraper
# Install dependencies
# Node (Playwright)
npm install
npx playwright install
# or Python (Selenium/Playwright)
pip install -r requirements.txt
# Setup environment
cp .env.example .env
# then edit .env to set:
# PROXY_URL= # e.g. http://user:pass@host:port
# DRIVER=playwright # or selenium
# HEADLESS=true
# RATE_MIN_MS=800
# RATE_MAX_MS=2200
# STORAGE_DIR=.storage
# OUT_FORMAT=jsonl # csv|jsonl|sqlite|postgres
# Run (examples)
# Scrape a hashtag page (public)
npm run scrape:hashtag -- --tag "travel" --limit 50
# or
python main.py hashtag --tag "travel" --limit 50{"type":"post","shortcode":"CxyZ12A","likes":1243,"comments":57,"caption":"Sunset shots #travel","timestamp":"2025-10-11T14:22:10Z","author":"@example"}
{"type":"profile","username":"example","followers":10422,"following":312,"posts":87,"bio":"Photographer | Traveler"}MIT License
