Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 45 minutes
The internet is made up of a bunch of resources hosted on different servers. The term resource corresponds to any entity on the web, including HTML files, stylesheets, images, videos, and scripts. To access content on the internet, our browser must ask these servers for the resources it wants, and then display these resources to us. This protocol of requests and responses enables we view this page in our browser.
On reading this article, you will have an understanding of:
- What is HTTP and how it works?
- Working with different HTTP requests in Python using the requests library
- Understanding the information that web resources interact with and use it programmatically
What is HTTP?
HTTP stands for Hypertext Transfer Protocol and is used to structure requests and responses over the internet. HTTP requires data to be transferred from one point to another over the network.
The transfer of resources happens using TCP (Transmission Control Protocol). In viewing this webpage, TCP manages the channels between our browser and the server. TCP is used to manage many types of internet connections in which one computer or device wants to send something to another. HTTP is the command language that the devices on both sides of the connection must follow in order to communicate.
HOW HTTP WORKS?
When we type an address such as www.github.com into our browser, we are commanding it to open a TCP channel to the server that responds to that URL (or Uniform Resource Locator). A URL is like our home address or phone number because it describes how to reach us.
In this situation, our computer, which is making the request, is called the client. The URL we are requesting is the address that belongs to the server.
Once the TCP connection is established, the client sends a HTTP GET request to the server to retrieve the webpage it should display. After the server has sent the response, it closes the TCP connection. If we open the website in our browser again, or if our browser automatically requests something from the server, a new connection is opened which follows the same process described above. GET requests are one kind of HTTP method a client can call.
What are the different HTTP VERBS?
There are 4 basic HTTP verbs we use in requests to interact with resources in a REST system:
- GET — retrieve a specific resource (by id) or a collection of resources
- POST — create a new resource
- PUT — update a specific resource (by id)
- DELETE — remove a specific resource by id
HTTP requests in python
To make HTTP requests in python, we can use several HTTP libraries like:
-
GRequest - GRequests allows we to use Requests with Gevent to make asynchronous HTTP Requests easily.
-
httplib2 - Python httplib2 module provides methods for accessing Web resources via HTTP. It supports many features, such as HTTP and HTTPS, authentication, caching, redirects, and compression. Httplib2 is a comprehensive HTTP client library, httplib2.py supports many features left out of other HTTP libraries. HTTPS support is only available if the socket module was compiled with SSL support.
-
request - This library help humans to interact with the languages. With Request library, we don’t need to add query, string manually to our URL’s or form-encode our POST data. We can send HTTP request to server using Request library and we can add form data, content like header, multi-part files, etc.
Gihub repository - https://github.com/requests/requests -
urllib - Urllib3 is a powerful, sanity-friendly HTTP client for Python.urllib3 brings many critical features that are missing from the Python standard libraries such as thread safety, connection pooling, client-side, SSL/TLS verification, file uploads with multipart encoding etc.
The most elegant and simplest of above listed libraries is Requests. We will be using requests library in this article.
To download and install Requests library, use following command:
pip install requests
Importing the Requests Module
To work with the Requests library in Python, we must import the appropriate module. We can do this simply by adding the following code at the beginning of our script:
import requests
Making a Request
When we ping a website or portal for information this is called making a request. That is exactly what the Requests library has been designed to do.
To get a webpage we would do something like the following:
r = requests.get(‘https://github.com/timeline.json’)
Working with Response Code
Before we can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal. We can do this with the dictionary look-up object.
r = requests.get('https://github.com/timeline.json')
r.status_code
>>200
r.status_code == requests.codes.ok
>>> True
requests.codes['temporary_redirect']
>>> 307
requests.codes.teapot
>>> 418
requests.codes['o/']
>>> 200
Get the Content
After a web server returns a response, we can collect the content we need. This is also done using the get requests function.
import requests
r = requests.get('https://github.com/timeline.json')
print r.text
# The Requests library also comes with a built-in JSON decoder,
# just in case we have to deal with JSON data
import requests
r = requests.get('https://github.com/timeline.json')
print r.json
Working with Headers
By utilizing a Python dictionary, we can access and view a server’s response headers. Thanks to how Requests works, we can access the headers using any capitalization we’d like.
If we perform this function but a header doesn’t exist in the response, the value will default to None.
r.headers
{
'status': '200 OK',
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json; charset=utf-8'
}
r.headers['Content-Type']
>>>'application/json; charset=utf-8'
r.headers.get('content-type')
>>>'application/json; charset=utf-8'
r.headers['X-Random']
>>>None
# Get the headers of a given URL
resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers
Encoding
Requests will automatically decade any content pulled from a server. But most Unicode character sets are seamlessly decoded anyway.
When we make a request to a server, the Requests library make an educated guess about the encoding for the response, and it does this based on the HTTP headers. The encoding that is guessed will be used when we access the r.text file.
Through this file, we can discern what encoding the Requests library is using, and change it if need be. This is possible thanks to the r.encoding property we’ll find in the file.
If and when we change the encoding value, Requests will use the new type so long as we call r.text in our code.
print r.encoding
>> utf-8
>>> r.encoding = ‘ISO-8859-1’
Types of encoding
ASCII is a simple encoding with 128 characters, including Latin alphabet, digits, punctuation marks, and utility characters.
7 bits is enough to represent any ASCII character. The word "test" in HEX representation would look like: 0x74 0x65 0x73 0x74. The first bit of any character would always be 0, as the encoding has 128 characters, and a bite gives 2^8 = 256 variants.
UTF-8 is one of the most famous encodings alongside with ASCII. It is capable of encoding 1,112,064 characters. Each character size is varied from 1 to 4 bites (previously the values could be up to 6 bites).
The program processing this encoding checks the first bit and estimates the character size in bytes. If an octet begins with 0, the character is represented by 1 byte. 110 - 2 bytes, 1110 - 3 bytes, 11110 - 4 bytes.
Custom Headers
If you want to add custom HTTP headers to a request, you must pass them through a dictionary to the headers parameter.
import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
r = requests.post(url, data=json.dumps(payload), headers=headers)
Redirection and History
Requests will automatically perform a location redirection when you use the GET and OPTIONS verbs in Python.
GitHub will redirect all HTTP requests to HTTPS automatically. This keeps things secure and encrypted.
You can use the history method of the response object to track redirection status.
r = requests.get('http://github.com')
r.url
>>> 'https://github.com/'
r.status_code
>>> 200
r.history
>>> []
Make an HTTP Post Request
You can also handle post requests using the Requests library.
r = requests.post(http://httpbin.org/post)
But you can also rely on other HTTP requests too, like PUT, DELETE, HEAD, and OPTIONS.
r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")
You can use these methods to accomplish a great many things. For instance, using a Python script to create a GitHub repo.
import requests, json
github_url = "https://api.github.com/user/repos"
data = json.dumps({'name':'test', 'description':'some test repo'})
r = requests.post(github_url, data, auth=('user', '*****'))
print r.json
PUT Method
The PUT method completely replaces whatever currently exists at the target URL with something else. With this method, you can create a new resource or overwrite an existing one given you know the exact Request-URI.
A basic PUT in requests looks like:
payload = {'username': 'bob', 'email': 'bob@bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)
We can then check the response status code with:
r.status_code
or the response with:
r.content
Requests has a lot synactic sugar and shortcuts that'll make your life easier.
In short, the PUT method is used to create or overwrite a resource at a particular URL that is known by the client.
DELETE Method
The HTTP DELETE request method deletes the specified resource.
Syntax
DELETE /file.html HTTP/1.1
Request
DELETE /file.html HTTP/1.1
Example
payload = {'some':'data'}
headers = {'content-type': 'application/json'}
url = "https://www.toggl.com/api/v6/" + data_description + ".json"
response = requests.delete(url, data=json.dumps(payload), headers=headers,auth=HTTPBasicAuth(toggl_token, 'api_token'))
Responses
If a DELETE method is successfully applied, there are several response status codes possible:
- A 202 (Accepted) status code if the action will likely succeed but has not yet been enacted.
- A 204 (No Content) status code if the action has been enacted and no further information is to be supplied.
- A 200 (OK) status code if the action has been enacted and the response message includes a representation describing the status.
HTTP/1.1 200 OK
Date: Wed, 21 June 2019 07:28:00 GMT
<html>
<body>
<h1>File deleted.</h1>
</body>
</html>
HTTP response status codes
HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped in five classes: informational responses, successful responses, redirects, client errors, and servers errors.
Informational responses
100 Continue
This interim response indicates that everything so far is OK and that the client should continue with the request or ignore it if it is already finished.
101 Switching Protocol
This code is sent in response to an Upgrade request header by the client, and indicates the protocol the server is switching to.
Successful responses
200 OK
The request has succeeded. The meaning of a success varies depending on the HTTP method:
GET: The resource has been fetched and is transmitted in the message body.
HEAD: The entity headers are in the message body.
PUT or POST: The resource describing the result of the action is transmitted in the message body.
TRACE: The message body contains the request message as received by the server
201 Created
The request has succeeded and a new resource has been created as a result of it. This is typically the response sent after a POST request, or after some PUT requests.
202 Accepted
The request has been received but not yet acted upon. It is non-committal, meaning that there is no way in HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.
Redirection messages
300 Multiple Choice
The request has more than one possible response. The user-agent or user should choose one of them. There is no standardized way of choosing one of the responses.
301 Moved Permanently
This response code means that the URI of the requested resource has been changed permanently. Probably, the new URI would be given in the response.
302 Found
This response code means that the URI of requested resource has been changed temporarily. New changes in the URI might be made in the future. Therefore, this same URI should be used by the client in future requests.
Client error responsesSection
400 Bad Request
This response means that server could not understand the request due to invalid syntax.
401 Unauthorized
Although the HTTP standard specifies "unauthorized", semantically this response means "unauthenticated". That is, the client must authenticate itself to get the requested response.
Server error responses
500 Internal Server Error
The server has encountered a situation it doesn't know how to handle.
501 Not Implemented
The request method is not supported by the server and cannot be handled. The only methods that servers are required to support (and therefore that must not return this code) are GET and **HEAD.
Errors and Exceptions
There are a number of exceptions and error codes you need to be familiar with when using the Requests library in Python.
- If there is a network problem like a DNS failure, or refused connection the Requests library will raise a ConnectionError exception.
- With invalid HTTP responses, Requests will also raise an HTTPError exception, but these are rare.
- If a request times out, a Timeout exception will be raised.
- If and when a request exceeds the preconfigured number of maximum redirections, then a TooManyRedirects exception will be raised.
Any exceptions that Requests raises will be inherited from the requests.exceptions.RequestException object.