1. Web Scraping with Python | IMDb Rating

Krushi Patel
2 min readJul 26, 2021

What is Web Scraping?

Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code. In this article, we’ll see how to implement web scraping with python.

Libraries used for Web Scraping

As we all know, In Python there are different libraries for various purposes.we will be using the subsequent libraries:

  • BeautifulSoup: BeautifulSoup is a Python tool that allows you to parse HTML and XML texts. It generates parse trees, which are useful for quickly extracting information.
  • Pandas: Pandas is a library that may be used to manipulate and analyse data. It is customary to extract data and save it in the desired format.
  • Requests :The requests module allows you to use Python to send HTTP requests. The response data from an HTTP request is returned as a Response Object (content, encoding, status, etc).

Step 1: Find the URL that you wants to scrap

The URL for this page is https://www.imdb.com/chart/top/?ref_=nv_mv_250

Step 2:Inspecting the Page

To inspect the page, click on the element and click on on “Inspect”.

Step 3:Find the info you would like to extract

Extract the Title name,year of releasing and rating which is within the “div” tag respectively.

Step 4 : Write the code

First, let’s create a Python file. To do this, you can use Google Co-lab or Jupiter book. I am using Google Co-lab for this.

Import libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Create an empty array to store the details. In this case, we create 3 Empty Array to store the title name, year and rating.

TitleName=[]
Year=[]
Rating=[]

open the URL and extract the info from the web site

url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
r = requests.get(url).content

Using BeautifulSoup’s Find and Find All functions.
We take the information and store it in the variable.

soup = BeautifulSoup(r, "html.parser") 
list = soup.find("tbody", {"class":"lister-list"}).find_all("tr")
x = 1
for i in list:
title = i.find("td",{"class":"titleColumn"})
year = i.find("span",{"class":"secondaryInfo"})
rating = i.find("td",{"class":"ratingColumn"})

Using append we store the small print within the Array we’ve created before

TitleName.append(title.text)
Year.append(year.text)
Rating.append(rating.text)

Step 5. Store the info during a Sheet.We store the info in Comma-separated values (CSV format)

df=pd.DataFrame({'Title' : TitleName,'year':Year,'Rating':Rating})
df.to_csv('IMDbRating.csv', index=False, encoding='utf-8')

Step 6. Now run the whole code.

All the data are stored in IMDbRating.csv file.

Scrap data in CSV file format

For whole code refer https://github.com/krushipatel123/Pr-1_DS

--

--