Working with HTTP requests in Python

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

Reading time: 45 minutes

The internet is made up of a bunch of resources hosted on different servers. The term resource corresponds to any entity on the web, including HTML files, stylesheets, images, videos, and scripts. To access content on the internet, our browser must ask these servers for the resources it wants, and then display these resources to us. This protocol of requests and responses enables we view this page in our browser.

On reading this article, you will have an understanding of:

  • What is HTTP and how it works?
  • Working with different HTTP requests in Python using the requests library
  • Understanding the information that web resources interact with and use it programmatically

What is HTTP?

HTTP stands for Hypertext Transfer Protocol and is used to structure requests and responses over the internet. HTTP requires data to be transferred from one point to another over the network.

The transfer of resources happens using TCP (Transmission Control Protocol). In viewing this webpage, TCP manages the channels between our browser and the server. TCP is used to manage many types of internet connections in which one computer or device wants to send something to another. HTTP is the command language that the devices on both sides of the connection must follow in order to communicate.

HOW HTTP WORKS?

When we type an address such as www.github.com into our browser, we are commanding it to open a TCP channel to the server that responds to that URL (or Uniform Resource Locator). A URL is like our home address or phone number because it describes how to reach us.

In this situation, our computer, which is making the request, is called the client. The URL we are requesting is the address that belongs to the server.

Once the TCP connection is established, the client sends a HTTP GET request to the server to retrieve the webpage it should display. After the server has sent the response, it closes the TCP connection. If we open the website in our browser again, or if our browser automatically requests something from the server, a new connection is opened which follows the same process described above. GET requests are one kind of HTTP method a client can call.

What are the different HTTP VERBS?

There are 4 basic HTTP verbs we use in requests to interact with resources in a REST system:

  1. GET — retrieve a specific resource (by id) or a collection of resources
  2. POST — create a new resource
  3. PUT — update a specific resource (by id)
  4. DELETE — remove a specific resource by id

HTTP requests in python

To make HTTP requests in python, we can use several HTTP libraries like:

  1. GRequest - GRequests allows we to use Requests with Gevent to make asynchronous HTTP Requests easily.

  2. httplib2 - Python httplib2 module provides methods for accessing Web resources via HTTP. It supports many features, such as HTTP and HTTPS, authentication, caching, redirects, and compression. Httplib2 is a comprehensive HTTP client library, httplib2.py supports many features left out of other HTTP libraries. HTTPS support is only available if the socket module was compiled with SSL support.

  3. request - This library help humans to interact with the languages. With Request library, we don’t need to add query, string manually to our URL’s or form-encode our POST data. We can send HTTP request to server using Request library and we can add form data, content like header, multi-part files, etc.
    Gihub repository - https://github.com/requests/requests

  4. urllib - Urllib3 is a powerful, sanity-friendly HTTP client for Python.urllib3 brings many critical features that are missing from the Python standard libraries such as thread safety, connection pooling, client-side, SSL/TLS verification, file uploads with multipart encoding etc.

The most elegant and simplest of above listed libraries is Requests. We will be using requests library in this article.

To download and install Requests library, use following command:

pip install requests

Importing the Requests Module

To work with the Requests library in Python, we must import the appropriate module. We can do this simply by adding the following code at the beginning of our script:

import requests 

Making a Request

When we ping a website or portal for information this is called making a request. That is exactly what the Requests library has been designed to do.

To get a webpage we would do something like the following:

r = requests.get(‘https://github.com/timeline.json’)

Working with Response Code

Before we can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal. We can do this with the dictionary look-up object.

r = requests.get('https://github.com/timeline.json')
r.status_code
>>200
 
r.status_code == requests.codes.ok
>>> True
 
requests.codes['temporary_redirect']
>>> 307
 
requests.codes.teapot
>>> 418
 
requests.codes['o/']
>>> 200

Get the Content

After a web server returns a response, we can collect the content we need. This is also done using the get requests function.

import requests
r = requests.get('https://github.com/timeline.json')
print r.text
 
# The Requests library also comes with a built-in JSON decoder,
# just in case we have to deal with JSON data
 
import requests
r = requests.get('https://github.com/timeline.json')
print r.json

Working with Headers

By utilizing a Python dictionary, we can access and view a server’s response headers. Thanks to how Requests works, we can access the headers using any capitalization we’d like.

If we perform this function but a header doesn’t exist in the response, the value will default to None.

r.headers
{
    'status': '200 OK',
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json; charset=utf-8'
}
 
r.headers['Content-Type']
>>>'application/json; charset=utf-8'
 
r.headers.get('content-type')
>>>'application/json; charset=utf-8'
 
r.headers['X-Random']
>>>None
 
# Get the headers of a given URL
resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Encoding

Requests will automatically decade any content pulled from a server. But most Unicode character sets are seamlessly decoded anyway.

When we make a request to a server, the Requests library make an educated guess about the encoding for the response, and it does this based on the HTTP headers. The encoding that is guessed will be used when we access the r.text file.

Through this file, we can discern what encoding the Requests library is using, and change it if need be. This is possible thanks to the r.encoding property we’ll find in the file.

If and when we change the encoding value, Requests will use the new type so long as we call r.text in our code.

print r.encoding
>> utf-8
 
>>> r.encoding = ‘ISO-8859-1’

Types of encoding

ASCII is a simple encoding with 128 characters, including Latin alphabet, digits, punctuation marks, and utility characters.

7 bits is enough to represent any ASCII character. The word "test" in HEX representation would look like: 0x74 0x65 0x73 0x74. The first bit of any character would always be 0, as the encoding has 128 characters, and a bite gives 2^8 = 256 variants.

UTF-8 is one of the most famous encodings alongside with ASCII. It is capable of encoding 1,112,064 characters. Each character size is varied from 1 to 4 bites (previously the values could be up to 6 bites).

The program processing this encoding checks the first bit and estimates the character size in bytes. If an octet begins with 0, the character is represented by 1 byte. 110 - 2 bytes, 1110 - 3 bytes, 11110 - 4 bytes.

Custom Headers

If you want to add custom HTTP headers to a request, you must pass them through a dictionary to the headers parameter.

import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
 
r = requests.post(url, data=json.dumps(payload), headers=headers)

Redirection and History

Requests will automatically perform a location redirection when you use the GET and OPTIONS verbs in Python.

GitHub will redirect all HTTP requests to HTTPS automatically. This keeps things secure and encrypted.

You can use the history method of the response object to track redirection status.

r = requests.get('http://github.com')
r.url
>>> 'https://github.com/'
 
r.status_code
>>> 200
 
r.history 
>>> []

Make an HTTP Post Request

You can also handle post requests using the Requests library.

r = requests.post(http://httpbin.org/post)

But you can also rely on other HTTP requests too, like PUT, DELETE, HEAD, and OPTIONS.

r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")

You can use these methods to accomplish a great many things. For instance, using a Python script to create a GitHub repo.

import requests, json
 
github_url = "https://api.github.com/user/repos"
data = json.dumps({'name':'test', 'description':'some test repo'})
r = requests.post(github_url, data, auth=('user', '*****'))
 
print r.json

PUT Method

The PUT method completely replaces whatever currently exists at the target URL with something else. With this method, you can create a new resource or overwrite an existing one given you know the exact Request-URI.
A basic PUT in requests looks like:

payload = {'username': 'bob', 'email': 'bob@bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)

We can then check the response status code with:

r.status_code

or the response with:

r.content

Requests has a lot synactic sugar and shortcuts that'll make your life easier.
In short, the PUT method is used to create or overwrite a resource at a particular URL that is known by the client.

DELETE Method

The HTTP DELETE request method deletes the specified resource.

Syntax

DELETE /file.html HTTP/1.1 

Request

DELETE /file.html HTTP/1.1

Example

payload = {'some':'data'}
headers = {'content-type': 'application/json'}
url = "https://www.toggl.com/api/v6/" + data_description + ".json"
response = requests.delete(url, data=json.dumps(payload), headers=headers,auth=HTTPBasicAuth(toggl_token, 'api_token'))

Responses

If a DELETE method is successfully applied, there are several response status codes possible:

  • A 202 (Accepted) status code if the action will likely succeed but has not yet been enacted.
  • A 204 (No Content) status code if the action has been enacted and no further information is to be supplied.
  • A 200 (OK) status code if the action has been enacted and the response message includes a representation describing the status.
HTTP/1.1 200 OK 
Date: Wed, 21 June 2019 07:28:00 GMT

<html>
  <body>
    <h1>File deleted.</h1> 
  </body>
</html>

HTTP response status codes

HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped in five classes: informational responses, successful responses, redirects, client errors, and servers errors.

Informational responses

100 Continue

This interim response indicates that everything so far is OK and that the client should continue with the request or ignore it if it is already finished.

101 Switching Protocol

This code is sent in response to an Upgrade request header by the client, and indicates the protocol the server is switching to.

Successful responses

200 OK

The request has succeeded. The meaning of a success varies depending on the HTTP method:
GET: The resource has been fetched and is transmitted in the message body.
HEAD: The entity headers are in the message body.
PUT or POST: The resource describing the result of the action is transmitted in the message body.
TRACE: The message body contains the request message as received by the server

201 Created

The request has succeeded and a new resource has been created as a result of it. This is typically the response sent after a POST request, or after some PUT requests.

202 Accepted

The request has been received but not yet acted upon. It is non-committal, meaning that there is no way in HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.

Redirection messages

300 Multiple Choice

The request has more than one possible response. The user-agent or user should choose one of them. There is no standardized way of choosing one of the responses.

301 Moved Permanently

This response code means that the URI of the requested resource has been changed permanently. Probably, the new URI would be given in the response.

302 Found

This response code means that the URI of requested resource has been changed temporarily. New changes in the URI might be made in the future. Therefore, this same URI should be used by the client in future requests.

Client error responsesSection

400 Bad Request

This response means that server could not understand the request due to invalid syntax.

401 Unauthorized

Although the HTTP standard specifies "unauthorized", semantically this response means "unauthenticated". That is, the client must authenticate itself to get the requested response.

Server error responses

500 Internal Server Error

The server has encountered a situation it doesn't know how to handle.

501 Not Implemented

The request method is not supported by the server and cannot be handled. The only methods that servers are required to support (and therefore that must not return this code) are GET and **HEAD.

Errors and Exceptions

There are a number of exceptions and error codes you need to be familiar with when using the Requests library in Python.

  • If there is a network problem like a DNS failure, or refused connection the Requests library will raise a ConnectionError exception.
  • With invalid HTTP responses, Requests will also raise an HTTPError exception, but these are rare.
  • If a request times out, a Timeout exception will be raised.
  • If and when a request exceeds the preconfigured number of maximum redirections, then a TooManyRedirects exception will be raised.

Any exceptions that Requests raises will be inherited from the requests.exceptions.RequestException object.

Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.