Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article at OpenGenus, we will explore various methods of validating email addresses in Python using regular expressions, specifically focusing on the standard email format.
Table of contents:
- Introduction
- Regular Expression and Email
- Examples
- Solution
- Quiz
Introduction
The task is to check whether the given email string is valid or invalid. We can use different methods to implement this in Python:
- Using 're' library
- Using 'email_validator' library
- Using SMTP connection
Regular Expression and Email
A regular expression (shortened as regex or regexp), sometimes referred to as a rational expression, is a sequence of characters that specifies a match pattern in text.
- All characters, except those having special meaning in regex, matches themselves. E.g., the regex x matches substring "x" and regex @ matches "@".
- These characters have special meaning in regex : ., +, *, ?, ^, $, (, ), [, ], {, }, |, . To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash (\). E.g., \. matches "." and regex \( matches "(". You also need to use regex \\ to match "\".
- Strings can be matched via combining a sequence of characters (called sub-expressions). E.g., the regex Saturday matches "Saturday".
- The [...], known as character class , encloses a list of characters. It matches any SINGLE character in the list. In this example, [0-9] matches any SINGLE character between 0 and 9 , where dash (-) denotes the range. The input "abc00123xyz456_0", matches substrings "00123", "456" and "0".
An Email (electronic mail) is the exchange of computer-stored messages from one user to one or more recipients via the internet. An email address identifies an email box to which messages are delivered.
- An email is a string separated into two parts by @ symbol, a “local_part” and a domain, that is local_part@domain.
- The personal_info represents a subset of ASCII characters and is followed by the domain. If the domain is a domain name rather than an IP address then the SMTP client uses the domain name to look up the mail exchange IP address.
- The general format of an email address is local-part@domain, e.g. jsmith@[192.168.1.2],
jsmith@example.com
. - Allowed characters: letters (a-z), numbers, underscores, periods, and dashes. An underscore, period, or dash must be followed by one or more letter or number.
Examples
Consider the following email examples :
Valid | Reason | |
---|---|---|
robsmith@gmail.com |
Yes | Follows standard email format |
rob.smith@gmail.com |
Yes | Email can have a period |
robsmith.gmail.com |
No | Email does not have an @ symbol |
robsmith_@gmail.c |
No | Domain name must be a valid one (minimum 2 characters e.g. robsmith_@gmail.cc ) |
From the examples in the table we can derive a regular expression which can be used to validate an email :
\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b
- \b asserts a word boundary, ensuring that the email address is not part of a larger word.
- [A-Za-z0-9._%+-]+ matches one or more of the following characters: uppercase letters, lowercase letters, digits, period (.), underscore (_), percent (%), plus (+), or hyphen (-). This represents the local part of the email address before the @ symbol.
@ matches the @ symbol. - [A-Za-z0-9.-]+ matches one or more of the following characters: uppercase letters, lowercase letters, digits, period (.), or hyphen (-). This represents the domain name (excluding the top-level domain) of the email address.
- \. matches a period (.) character. It is escaped with a backslash () because the period is a special character in regex.
- [A-Za-z]{2,} matches two or more uppercase or lowercase letters. This represents the top-level domain (e.g., com, org, edu) of the email address.
- \b asserts a word boundary, ensuring that the email address is not part of a larger word.
Solution
1.Using 're' library
#import the re (Regular Expression) library
import re
#define our regex
regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
#add our examples email to a list
email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']
for email in email_list :
#match the email and regex
if(re.fullmatch(regex, email)):
print(email," : Valid")
else:
print(email," : InValid")
Output
robsmith@gmail.com : Valid
rob.smith@gmail.com : Valid
robsmith.gmail.com : InValid
robsmith_@gmail.c : InValid
Time and Space Complexity
Time Complexity : O(n) , n is the number of email addresses in the email_list
Space Complexity : O(1) , which means it requires a constant amount of additional space
Explanation
- The code uses the re library in Python to match and validate email addresses based on a regular expression pattern.
- In this code, we define the regular expression pattern
r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
, which follows common patterns for email addresses. - The re.fullmatch() function is used to match the entire string against the regular expression pattern. If it's a valid email address, it will print "Valid," and if it's invalid, it will print "Invalid."
2.Using 'email_validator' library
#import the email_validator library
from email_validator import validate_email, EmailNotValidError
#add our examples email to a list
email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']
for email in email_list :
#use try catch to handle exceptions
try:
if validate_email(email) :
print(email," : Valid")
except EmailNotValidError as e:
print(email," : InValid")
Output
robsmith@gmail.com : Valid
rob.smith@gmail.com : Valid
robsmith.gmail.com : InValid
robsmith_@gmail.c : InValid
Time and Space Complexity
Time Complexity : O(n) , n is the number of email addresses in the email_list
Space Complexity : O(1) , which means it requires a constant amount of additional space
Explanation
- The code uses the email_validator library in Python to validate email addresses. This library provides a more robust and reliable approach to validating email addresses compared to regular expressions.
- In this code, we import the validate_email function and the EmailNotValidError exception from the email_validator library. The validate_email function is used to validate the email address.
- Inside the loop, we try to validate each email address using validate_email(email). If the email is valid, the code will continue executing and print "Valid." If the email is invalid, an EmailNotValidError exception will be raised, and the code will print "Invalid."
3.Using SMTP connection
#import the libraries smtplib and dns.resolver
import smtplib
import dns.resolver
def validate_email(email):
# Check if "@" symbol is present in the email
if '@' not in email:
return False
# Split the email address to extract the domain
domain = email.split('@')[1]
#use try catch to handle exceptions
try:
# Query the MX records of the domain
records = dns.resolver.query(domain, 'MX')
mx_record = str(records[0].exchange)
# Connect to the SMTP server of the domain
server = smtplib.SMTP()
server.set_debuglevel(0)
server.connect(mx_record)
server.helo(server.local_hostname)
server.mail('me@domain.com')
# Check the response code for the email address
code, message = server.rcpt(str(email))
server.quit()
# If response code is 250, the email address is valid
if code == 250:
return True
else:
return False
except dns.resolver.NXDOMAIN:
return False
except smtplib.SMTPConnectError:
return False
except smtplib.SMTPServerDisconnected:
return False
except smtplib.SMTPResponseException:
return False
except:
return False
email_list = ['robsmith@gmail.com', 'rob.smith@gmail.com', 'robsmith.gmail.com', 'robsmith_@gmail.c']
# Iterate over the email list and validate each email address
for email in email_list:
if validate_email(email):
print(email, ": Valid")
else:
print(email, ": Invalid")
Output
robsmith@gmail.com : Invalid
rob.smith@gmail.com : Invalid
robsmith.gmail.com : Invalid
robsmith_@gmail.c : Invalid
Time and Space Complexity
Time Complexity : O(n) , n is the number of email addresses in the email_list
Space Complexity : O(1) , which means it requires a constant amount of additional space
Explanation
-
The code uses the smtplib and dns.resolver libraries to validate email addresses. It performs DNS MX record lookup and connects to the SMTP server of the domain to check the validity of the email address.
-
It does the following steps :
- Checks if the "@" symbol is present in the email address. If not, it returns False indicating an invalid email.
- Extracts the domain from the email address.
- Queries the MX records of the domain using dns.resolver.query.
- Connects to the SMTP server of the domain using smtplib.SMTP.
- Sends a MAIL FROM command to the server using server.mail.
- Checks the response code of the RCPT TO command for the email address.
- If the response code is 250, indicating a successful delivery, it returns True indicating a valid email. Otherwise, it returns False.
- Handles exceptions for DNS resolution errors and SMTP connections errors to handle invalid email addresses gracefully.
- In case of any other exception, it returns False.
-
The code then iterates over the email_list and calls the validate_email function for each email address. It prints "Valid" if the email is valid and "Invalid" otherwise.
-
It prints "Invalid" for emails because not all of them exist. Even though they follow the email regex , they won't be delivered because the email doesn't exist.