Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Let's dive right into the most useful data structure from the Python collections
module, the defaultdict
. The default dictionary is Python's intelligent hash table.'
In here we shall talk about:
- A brief introduction to Python3 dictionaries
- How do dictionaries work?
- What is a
defaultdict
? - How is it different from the regular dictionary in Python?
- The
default_factory
attribute - The
__missing__
function defaultdict
used as a frequency counter- Other important data structures from the Python collections module
A brief introduction to Python3 dictionaries
We may already have come across them in other languages and as parts of a data structure. They make a comeback in Python as dictionaries but we can imagine them as *drum rolls* Hash Maps or Hash Tables
Dictionaries are a collection of objects in no particular order (sadly, not anymore, but it's better not to depend on it)
Dictionaries are defined by a list of comma separted key-value pairs.
dictionary = {
key1: value1,
key2: value2,
.
.
.
key3: value3,
}
Comparing dictionaries with lists show us a few similarities:
- Both are mutable.
- Both are dynamic. Items can be added or deleted from them as and when needed.
- Both can be nested. A list of lists is possible to create. So is a dictionary which has dictionaries as it's values. Even a dictionary of lists can be created and a list of dictionaries as well.
And a few differences:
- List elements are accessed by their position in the list, via indexing.
- And since we assume a Dictionary to be a collection without any particular order, positions of elements aren't certain and thus elements are accessed via keys.
Fun fact: Loopup time in a list is $\mathcal{O}(n)$ whereas in a dictionary it's $\mathcal{O}(1)$. Jump on to the next section to know how.
How do dictionaries work?
In Python, the dictionaries are implemented as resizable hash tables. The most common operation for dictionaries are lookups and compared to B-trees, hash tables have a better performance and a simpler implementation.
The built-in hash()
function in Python calculates a hash code for each key in the dictionary. It uses this hash value calculated from the key value to calculate a location in an internal array where the value will be stored and later used to look up the key. If the key were to be a mutable object, by definition, the key's value could change, and thus the key's hash value could also change.
If we're successful in storing keys that all have different hash values, it'd mean that retrieving a key would take a constant time $\mathcal{O}(1)$.
Take a look at the references section at the end of the article to know what the Python3 documentation has to say about this.
What is a defaultdict?
A default dictionary is a dictionary that automatically assigns default values to keys, if queried keys are not present.
# how to import defaultdict in Python
from collections import defaultdict
help(defaultdict)
defaultdict(default_factory[, ...])
--> dict with default factory The default factory is called without arguments to produce a new value when a key is not present, in__getitem__
only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.
Creating a defaultdict is quite simple:
# examples of empty defaultdict initializations
default_dictionary1 = defaultdict(list)
default_dictionary2 = defaultdict(int)
default_dictionary3 = defaultdict(lambda: "some default string")
default_dictionary4 = defaultdict(None)
# examples of non-empty defaultdict initializations
default_dictionary5 = defaultdict(int, {
"france": 45,
"germany": 25,
"india": 64,
})
The signature of the defaultdict() is as follows:
defaultdict(default_factory[, ...]) --> dict with default factory
We shall learn more about the default_factory
argument soon. An abstract idea of it is that an instance of default_factory
is the default value of any key, if the key was not present earlier in the dictionary.
Available attributes of a defaultdict
are:
defaultdict.default_factory
# member 'default_factory' of 'collections.defaultdict' objects
Available methods in defaultdict
are:
defaultdict.clear()
# D.clear() -> None. Remove all items from D.
defaultdict.copy()
# D.copy() -> a shallow copy of D.
defaultdict.default_factory()
# Factory for default value called by __missing__().
defaultdict.fromkeys(iterable, value=None, /)
# Create a new dictionary with keys from iterable and values set to value.
defaultdict.get(key, default=None, /)
# Return the value for key if key is in the dictionary, else default
defaultdict.items()
# D.items() -> a set-like object providing a view on D's items
defaultdict.keys()
# D.keys() -> a set-like object providing a view on D's keys
defaultdict.pop()
# D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
# If key is not found, d is returned if given, otherwise KeyError is raised
defaultdict.popitem()
# D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple;
# but raise KeyError if D is empty.
defaultdict.setdefault(key, default=None, /)
# Insert key with a value of default if key is not in the dictionary.
# Return the value for key if key is in the dictionary, else default.
defaultdict.update()
# D.update([E, ]**F) -> None. Update D from dict/iterable E and F.
# If E is present and has a .keys() method, then does: for k in E: D[k] = E[k].
# If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v.
# In either case, this is followed by: for k in F: D[k] = F[k]
defaultdict.values()
# D.values() -> an object providing a view on D's values
How is it different from the regular dictionary in Python?
The difference from a regular dictionary arises when dealing with keys that are not present in the dictionary.
# an incomplete list of scores on a test
regular_dictionary = {
"ross": [89,78,95],
"chandler": [68,77,89],
"joey": [89,88],
"monica": [98],
}
default_dictionary = defaultdict(list,{
"ross": [89,78,95],
"chandler": [68,77,89],
"joey": [89,88],
"monica": [98],
})
# correct way of updating rachel's list of scores in a regular dictionary
if "rachel" in regular_dictionary:
regular_dictionary["rachel"] = [77,79]
else:
regular_dictionary["rachel"].extend([77,79])
# correct way of updating phoebe's list of scores in a default dictionary
default_dictionary["phoebe"].extend([95,83,79])
When a key is absent in a regular dictionary and code is written to retrieve it, like regular_dictionary["phoebe"]
, it generates a KeyError
. Whereas, when a key is absent in a default dictionary and code is written to retrieve it like default_dictionary["rachel"]
, it returns an instance of the default_factory
parameter of the defaultdict
, which, in this case is an empty list: []
Question
default_d = defaultdict(list)
regular_d = dict()
sample_list = [1,2,3]
sample_tuple = (1,2,3)
Which case won't generate an error?
int
, str
or tuple
.
list
s are unhashable and thus cannot be a key to a dictionary.
No such restrictions are put on dictionary values.
Since regular_d
doesn't contain any such element as (1,2,3)
hence a KeyError
is seen. Whereas, default_d
initializes such a key by itself with it's default value as an empty list []
The default_factory
attribute
This is the first argument to the defaultdict constructor and it's used by the __missing__()
method. If the argument to the constructor is absent, default_factory
is initialized as None
defaultdict(default_factory, **kwargs)
# first argument to the constructor
The __missing__
function
When a
defaultdict
(saydef_d
) is used to retrieve a value with a key (say'K'
) that doesn't exist, this dunder function__missing__()
is called by the dunder function__getitem__()
ofdef_d
.If the
default_factory
fordef_d
was set toNone
, it raises aKeyError
.
If thedefault_factory
is defined as something other than None, it inserts akey
-value
pair ofkey
-default_factory
indef_d
and returnsvalue
.
Pseudo code:__missing__(key) # Called by __getitem__ for missing key; if self.default_factory is None: raise KeyError((key,)) self[key] = value = self.default_factory() return value
Note:
Only the method
__getitem__()
can call__missing__()
. No other operation is allowed to call it. This means that theget()
method, when executed on a defaultdict, will, like normal dictionaries, returnNone
as a default value rather than using thedefault_factory
attribute to initialize the default.
defaultdict
used as a frequency counter
# dictionary of number of days in each month of a non-leap year
months = {
"jan": 31, "feb": 28, "mar": 31, "apr":30,
"may": 31, "jun": 30, "jul": 31, "aug":31,
"sep": 30, "oct": 31, "nov": 30, "dec":31,
}
freq_of_days = defaultdict(int)
for month, days in months.items():
freq_of_days[days] += 1
print(sorted(freq_of_days.items()))
Output: [(28, 1), (30, 4), (31, 7)]
Other important data structures from the Python collections module
- Named Tuple
- OrderedDict
- Counter
With this article at OpenGenus, you must have the complete idea of defaultdict in Python3. Enjoy.