
πΈοΈ The Ultimate Guide to Web Scraping with Python β Hinglish Mein
Web scraping is an essential skill for anyone working with data. In this guide, you'll learn how to scrape websites using Python, starting from basics to advanced techniques. We'll cover popular libraries like BeautifulSoup, requests, Selenium, and Scrapy. By the end of this guide, you'll know how to extract data from websites, handle pagination, and save it in formats like CSV or JSON. Plus, you'll understand the ethical considerations and best practices for web scraping. Perfect for beginners and advanced users alike!
Death Code - 5/6/2025, 9:39:57 AM
Ek complete guide jisme aap seekhenge Python ka use karke web scraping kaise karein β beginners se leke advanced level tak.
π Introduction
Aaj ke digital zamaane mein data sab kuch hai. Lekin jab ye data kisi website pe hota hai aur API available nahi hoti, tab kaam aata hai Web Scraping. Python, ek aisi language hai jisme web scraping bohot easily ho jata hai, thanks to libraries like BeautifulSoup
, requests
, Selenium
, aur Scrapy
.
Is blog mein hum step-by-step dekhenge ki kaise aap Python se kisi bhi website ka data scrape kar sakte ho β bina koi rule violate kiye.
π§ Tools & Libraries Required
pip install requests beautifulsoup4 lxml
Agar JavaScript-heavy site scrape karni ho to:
pip install selenium
Bonus advanced use ke liye:
pip install scrapy
πͺ Step-by-Step Web Scraping Process
1. Target Website Identify Karna
Sabse pehle decide karo ki kis site se data chahiye. Example: https://quotes.toscrape.com
2. Inspect Page Structure
Right-click β Inspect β Find HTML elements jaha data hai. Mostly <div>
, <p>
, <span>
, etc.
3. Send HTTP Request
import requests
url = 'https://quotes.toscrape.com'
response = requests.get(url)
print(response.text) # HTML source code
4. Parse HTML with BeautifulSoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('span', class_='text')
for quote in quotes:
print(quote.text)
5. Handle Pagination (Agle Pages)
next_btn = soup.find('li', class_='next')
if next_btn:
next_link = next_btn.a['href']
full_url = url + next_link
6. Save Data (CSV/JSON)
import csv
with open('quotes.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Quote'])
for quote in quotes:
writer.writerow([quote.text])
π¦ Ethics & Legal Part
- Har site ke
robots.txt
ko respect karo - Server pe load na badhao β use delay
- Scraping for learning = π, for stealing data = π
π Jab JavaScript Load Kar Raha Ho (Use Selenium)
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Chrome()
browser.get('https://quotes.toscrape.com/js/')
soup = BeautifulSoup(browser.page_source, 'lxml')
quotes = soup.find_all('span', class_='text')
for quote in quotes:
print(quote.text)
π Advanced Level β Scrapy Framework
Scrapy ek powerful and scalable framework hai web scraping ke liye:
scrapy startproject quotesbot
Then create a spider, define selectors, and run:
scrapy crawl quotes
π Bonus Tips
- Use
User-Agent
spoofing to avoid blocking - Rotate proxies & IPs
- Use headless mode in Selenium
- Monitor site structure changes
π Final Words
Web scraping ek powerful skill hai jo har data enthusiast, analyst, aur developer ko aani chahiye. Python ke simple syntax aur libraries isko aur bhi easy bana dete hain.
Aapko ye guide kaisi lagi? Agar pasand aayi to share zaroor karein! π§ π»
BY DEATHCODE