×

Search anything:

Validate email in Python [3 methods]

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article at OpenGenus, we will explore various methods of validating email addresses in Python using regular expressions, specifically focusing on the standard email format.

Table of contents:

  1. Introduction
  2. Regular Expression and Email
  3. Examples
  4. Solution
  5. Quiz

Introduction

The task is to check whether the given email string is valid or invalid. We can use different methods to implement this in Python:

  1. Using 're' library
  2. Using 'email_validator' library
  3. Using SMTP connection

Regular Expression and Email

A regular expression (shortened as regex or regexp), sometimes referred to as a rational expression, is a sequence of characters that specifies a match pattern in text.

  • All characters, except those having special meaning in regex, matches themselves. E.g., the regex x matches substring "x" and regex @ matches "@".
  • These characters have special meaning in regex : ., +, *, ?, ^, $, (, ), [, ], {, }, |, . To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash (\). E.g., \. matches "." and regex \( matches "(". You also need to use regex \\ to match "\".
  • Strings can be matched via combining a sequence of characters (called sub-expressions). E.g., the regex Saturday matches "Saturday".
  • The [...], known as character class , encloses a list of characters. It matches any SINGLE character in the list. In this example, [0-9] matches any SINGLE character between 0 and 9 , where dash (-) denotes the range. The input "abc00123xyz456_0", matches substrings "00123", "456" and "0".

An Email (electronic mail) is the exchange of computer-stored messages from one user to one or more recipients via the internet. An email address identifies an email box to which messages are delivered.

  • An email is a string separated into two parts by @ symbol, a “local_part” and a domain, that is local_part@domain.
  • The personal_info represents a subset of ASCII characters and is followed by the domain. If the domain is a domain name rather than an IP address then the SMTP client uses the domain name to look up the mail exchange IP address.
  • The general format of an email address is local-part@domain, e.g. jsmith@[192.168.1.2], jsmith@example.com.
  • Allowed characters: letters (a-z), numbers, underscores, periods, and dashes. An underscore, period, or dash must be followed by one or more letter or number.

Examples

Consider the following email examples :

Email Valid Reason
robsmith@gmail.com Yes Follows standard email format
rob.smith@gmail.com Yes Email can have a period
robsmith.gmail.com No Email does not have an @ symbol
robsmith_@gmail.c No Domain name must be a valid one (minimum 2 characters e.g. robsmith_@gmail.cc)

From the examples in the table we can derive a regular expression which can be used to validate an email :
\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b

  • \b asserts a word boundary, ensuring that the email address is not part of a larger word.
  • [A-Za-z0-9._%+-]+ matches one or more of the following characters: uppercase letters, lowercase letters, digits, period (.), underscore (_), percent (%), plus (+), or hyphen (-). This represents the local part of the email address before the @ symbol.
    @ matches the @ symbol.
  • [A-Za-z0-9.-]+ matches one or more of the following characters: uppercase letters, lowercase letters, digits, period (.), or hyphen (-). This represents the domain name (excluding the top-level domain) of the email address.
  • \. matches a period (.) character. It is escaped with a backslash () because the period is a special character in regex.
  • [A-Za-z]{2,} matches two or more uppercase or lowercase letters. This represents the top-level domain (e.g., com, org, edu) of the email address.
  • \b asserts a word boundary, ensuring that the email address is not part of a larger word.

Solution

1.Using 're' library

#import the re (Regular Expression) library
import re

#define our regex
regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'

#add our examples email to a list
email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']
	
for email in email_list :
    #match the email and regex
	if(re.fullmatch(regex, email)):
	    
		print(email,"  : Valid")

	else:
	    
		print(email," : InValid")

Output

robsmith@gmail.com   : Valid
rob.smith@gmail.com   : Valid
robsmith.gmail.com  : InValid
robsmith_@gmail.c  : InValid

Time and Space Complexity

Time Complexity   : O(n) , n is the number of email addresses in the email_list
Space Complexity  : O(1) , which means it requires a constant amount of additional space

Explanation

  • The code uses the re library in Python to match and validate email addresses based on a regular expression pattern.
  • In this code, we define the regular expression pattern r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b', which follows common patterns for email addresses.
  • The re.fullmatch() function is used to match the entire string against the regular expression pattern. If it's a valid email address, it will print "Valid," and if it's invalid, it will print "Invalid."

2.Using 'email_validator' library

#import the email_validator library
from email_validator import validate_email, EmailNotValidError

#add our examples email to a list
email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']
	
for email in email_list :
    #use try catch to handle exceptions
    try:

	    if validate_email(email) :
	    
		    print(email,"  : Valid")


    except EmailNotValidError as e:
        
        print(email," : InValid")

Output

robsmith@gmail.com   : Valid
rob.smith@gmail.com   : Valid
robsmith.gmail.com  : InValid
robsmith_@gmail.c  : InValid

Time and Space Complexity

Time Complexity   : O(n) , n is the number of email addresses in the email_list
Space Complexity  : O(1) , which means it requires a constant amount of additional space

Explanation

  • The code uses the email_validator library in Python to validate email addresses. This library provides a more robust and reliable approach to validating email addresses compared to regular expressions.
  • In this code, we import the validate_email function and the EmailNotValidError exception from the email_validator library. The validate_email function is used to validate the email address.
  • Inside the loop, we try to validate each email address using validate_email(email). If the email is valid, the code will continue executing and print "Valid." If the email is invalid, an EmailNotValidError exception will be raised, and the code will print "Invalid."

3.Using SMTP connection

#import the libraries smtplib and dns.resolver
import smtplib

import dns.resolver

def validate_email(email):

    # Check if "@" symbol is present in the email 
    if '@' not in email:

        return False
    
    # Split the email address to extract the domain
    domain = email.split('@')[1]
    #use try catch to handle exceptions
    try:
        # Query the MX records of the domain
        records = dns.resolver.query(domain, 'MX')
        mx_record = str(records[0].exchange)
        
        # Connect to the SMTP server of the domain
        server = smtplib.SMTP()
        server.set_debuglevel(0)
        server.connect(mx_record)
        server.helo(server.local_hostname)
        server.mail('me@domain.com')
        
        # Check the response code for the email address
        code, message = server.rcpt(str(email))
        server.quit()

        # If response code is 250, the email address is valid
        if code == 250:

            return True

        else:

            return False

    except dns.resolver.NXDOMAIN:

        return False

    except smtplib.SMTPConnectError:

        return False

    except smtplib.SMTPServerDisconnected:

        return False

    except smtplib.SMTPResponseException:

        return False

    except:

        return False


email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']

# Iterate over the email list and validate each email address
for email in email_list:

    if validate_email(email):

        print(email, ": Valid")

    else:

        print(email, ": Invalid")

Output

robsmith@gmail.com : Invalid
rob.smith@gmail.com : Invalid
robsmith.gmail.com : Invalid
robsmith_@gmail.c : Invalid

Time and Space Complexity

Time Complexity   : O(n) , n is the number of email addresses in the email_list
Space Complexity  : O(1) , which means it requires a constant amount of additional space

Explanation

  • The code uses the smtplib and dns.resolver libraries to validate email addresses. It performs DNS MX record lookup and connects to the SMTP server of the domain to check the validity of the email address.

  • It does the following steps :

    • Checks if the "@" symbol is present in the email address. If not, it returns False indicating an invalid email.
    • Extracts the domain from the email address.
    • Queries the MX records of the domain using dns.resolver.query.
    • Connects to the SMTP server of the domain using smtplib.SMTP.
    • Sends a MAIL FROM command to the server using server.mail.
    • Checks the response code of the RCPT TO command for the email address.
    • If the response code is 250, indicating a successful delivery, it returns True indicating a valid email. Otherwise, it returns False.
    • Handles exceptions for DNS resolution errors and SMTP connections errors to handle invalid email addresses gracefully.
    • In case of any other exception, it returns False.
  • The code then iterates over the email_list and calls the validate_email function for each email address. It prints "Valid" if the email is valid and "Invalid" otherwise.

  • It prints "Invalid" for emails because not all of them exist. Even though they follow the email regex , they won't be delivered because the email doesn't exist.

Quiz

Question 1

Which of the following is an invalid email ?

robsmith@gmail.com
rob.smith@gmail.com
rob_smith@g-mail.com
robsmith_@gmail.c
Domain name must be a valid one (minimum 2 characters e.g. `robsmith_@gmail.cc`).

Question 2

Regular expressions are also called as ?

regex
regexp
rational expression
All of the above
A regular expression (shortened as regex or regexp), sometimes referred to as a rational expression, is a sequence of characters that specifies a match pattern in text.

Question 3

An email consists of two parts ?

True
False
An email is a string separated into two parts by @ symbol, a “local_part” and a domain, that is local_part@domain.

MATHANKUMAR V

Mathankumar V is the Winner of Smart India Hackathon (2022) and Software Developer, Intern at OpenGenus. He is pursuing BE in Computer Science from Dr. Mahalingam College of Engineering and Technology

Read More

Improved & Reviewed by:


OpenGenus Tech Review Team OpenGenus Tech Review Team
Validate email in Python [3 methods]
Share this