×

Search anything:

Python script to scroll on a webpage

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we will develop a Python script to scroll on a webpage and stimulate real usage behavior. We will use Selenium and ChromeDriver.

0. Introduction

Controlling your computer to scroll down webpages is a key feature in automation scenarios of web development.

In most cases, people would apply this feature in:

  • The automation of conducting webpage tests;
  • Web-scraping for gathering data from certain websites. (Beware of legal issues when scraping data!)

This article aims at getting people to understand the installation of framework and procedures to coding for this feature.


1. Getting Started - Installation & Preparation of Selenium Webdriver on MacOS Chrome Browser

In this article, Selenium is the framework we would use and Selenium Webdriver is its API for scrolling webpage.

Selenium is a popular framework for automation tasks in web development, which could be used under multiple languages including Python, Java, Javascript and C#.

Selenium Webdriver is an object-oriented API for automatically controlling browsers. In this article, we use Chrome as the main browser.

Here are the major steps:

  • Input the following command on Terminal
pip3 install selenium
  • If you are already using homebrew, just input the following command for installing Webdriver
brew install --cask chromedriver
  • Check whether everything is setup by entering:
chromedriver --version

If the installation is ready, you can see the versions listed on your terminal.

-----2022-03-14-02.01.33

(There are various other methods that could be found on Google. Since it's not the main point of this article, we would omit the installation on other devices.)

After installation, we need to get ready in Python. By importing webdriver and using it to automatically open Chrome, we can see the default pages on our screen.

# import the webdriver API to Python
from selenium import webdriver 
# Use Chrome Explorer to simulate
driver = webdriver.Chrome() 
# Set the layout for interface
driver.set_window_size(480, 800)
# Get the sample website to be scrolled
driver.get("http://www.opengenus.org/") 

We used Webdriver to control the Chrome interface to show the sample website for us. Run the above codes and we will see the interface as below:

-----2022-03-14-02.44.07

Seeing this suggests that we have successfully prepared for writing Python scripts to scroll the webpage in Chrome.


2. Make it Happen - Scrolling on a webpage using Selenium Webdriver

Depending on actual development needs, we may want to scroll straight to the bottom of a webpage, or to a specific element on it. We will discuss both scenarios in this section.

2.1 Scrolling to the bottom of webpages

The basic logic of scrolling in Webdriver is to get the height of a webpage and scroll to the end. In terms of the height of webpages, there are 2 situations:

  • Finite pages with a bottom footer, although sometimes it's too long to scroll manually. e.g. Amazon;
  • Infinite threads with latest updates. e.g. Facebook, Twitter, Youtube, etc.

For finite ones, we can write the script as follow:

# Use a finite website
driver.get("https://www.amazon.com/")
# Scroll to the height of the whole webpage
js="window.scrollTo(0,document.body.scrollHeight)" 
# Execute the scroll order by webdriver
driver.execute_script(js) 

Syntax explained:

  • "window.scrollTo(x-cord, y-cord)" returns the coordinates of the webpage (a document);
  • document.body.scrollHeight returns the height of the webpage;
  • driver.executescript() means to execute the script inside the parentheses.

Output:

The Amazon webpage is successfully scrolled to the bottom.

-----2022-03-14-03.31.28

For the infinite ones, however, we need to use while loops to ensure it keeps scrolling down.

# The library for controlling data loading time
import time
# Use an infinite website 
driver.get("https://iq.opengenus.org/")
# Get the screen height
screen_height = driver.execute_script("return window.screen.height;")   

# Use while loop to keep scrolling non-stop
i = 1
# Record the starting time of the loop
start = time.time()
while True:
    # Scroll 1 screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    # Allow for pause time to load data
    time.sleep(1)
    # Record ending time of the whole loop
    end = time.time()
    # Break the loop with a given time limit
    if round(end - start) > 30:
        break

Logic explained:

-----2022-03-14-03.58.18

Syntax explained:

  • {screen_height}*{i} returns to the coordinates of scrolling limits, being refreshed everytime a loop starts;
  • time.sleep() means to prolong with the given number of seconds;
  • time.time() returns the time point, could be subtracted to evaluate the time duration.

Output:

After 30 seconds, the scrolling process paused at Erick's article about Backpatching. (Could continue if we extend the time limit.)

-----2022-03-14-03.34.14

2.2 Scrolling to certain elements on webpages

We don't want to scroll to the bottom all the time. Sometimes we would need to locate to a certain part of the whole website. In this section, we will discuss two kinds of methods.

2.2.1 Scroll by pixels

driver.get("http://www.opengenus.org/") 
# Scroll Down
driver.execute_script("window.scrollBy(0,2000)","") 
# Scroll Up
driver.execute_script("window.scrollBy(0,-2000)","") 
# Scroll Right
driver.execute_script("window.scrollBy(2000,0)","") 
# Scroll Left
driver.execute_script("window.scrollBy(-2000,0)","") 
  • First, we can scroll by pixel both vertically and horizontally.
  • The former's coordinates looks like (0,Y); the latter's looks like (X,0).

2.2.1 Scroll to find targeted elements

driver.get("http://www.opengenus.org/") 
# Get the xpath of a certain word on webpage
element = driver.find_element_by_xpath('//*[@id="sec-pricing"]/div/h1')
# Scroll to where the xpath is in
driver.execute_script("return arguments[0].scrollIntoView();", element)
  • If we want to scroll to certain words on a website , we can search through its xpath (XML Path Language) in source codes.
  • In selenium, the xpath address is used to navigate the text of a web element.
  • For example, we want to locate the OpenGenus homepage to the last sentence "For press inquiry...."

How to find the xpath:

  1. Right-click the webpage, choose "inspect" to see the source code;

-----2022-03-14-04.31.40

  1. In the source code section, we are automatically located to the source code line of this sentence;

-----2022-03-14-04.31.52

  1. Right-click again to choose "copy the xpath" option;

-----2022-03-14-04.32.02

  1. Paste it into the parentheses of driver.find_element_by_xpath().

Above all, xpath is not the only element could be used. Other finding-by-elements functions can be found at The Webdriver Documentation for Python.

With this article at OpenGenus, you must have the complete idea of how to develop a Python script to scroll on a webpage.

Python script to scroll on a webpage
Share this