Serialization in Python using Pickle


Reading time: 30 minutes | Coding time: 10 minutes

In this article at OpenGenus, we will understand what is a Pickle and Unpickle in Python, Python pickle Module Interface and some Python Pickle and Unpickle Examples.

Often, we will want to save our work in Python and come back to it later. However,that work might be a machine learning model or some other complex object in Python.How do we save complex Python objects? Python has a module for this purpose called PICKLE.We can use pickle to write a binary file that contains all the information about a Python object.Later we can load that pickle file and reconstruct the object in Python.

Consider this list in Python:

pickle_example = ['opengenus', {'a': 23, 'b': True}, (1, 2, 3), [['dogs', 'cats'],       None]]

If we try to write the above list in a tex file directly, we will get an error:

#we cann't save this as text 
with open('./data/pickle_example.txt', 'w') as f:
    f.write(pickle_example)

OUTPUT

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-64da7289329e> in <module>
      3 # we can't save this as text
      4 with open('./data/pickle_example.txt', 'w') as f:
----> 5     f.write(pickle_example)

TypeError: write() argument must be str, not list

The solution is to pickle the list:

#we can save it as pickle
import pickle
with open('./data/pickle_example.pkl', 'wb') as f:
    pickle.dump(pickle_example, f)

with open('./data/pickle_example.pkl', 'rb') as f:
    reloaded_example = pickle.load(f)
reloaded_example

OUTPUT

['opengenus', {'a': 23, 'b': True}, (1, 2, 3), [['dogs', 'cats'],None]]

#the reloaded example is same as original
reloaded_example == pickle_example

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

Pickling

It's the process of converting a Python object into a byte stream to store it in a file.To performing pickling, we will use pickle.dump(file,protocol).

Example of pickling:

import pickle
RegisterList = ['1005a', '1001b', '1003c', '1002d']
with open('picklefile.txt', 'wb') as fh:
   pickle.dump(RegisterList, fh)

When above code is executed, the list object’s byte representation will be stored in picklefile.txt file.

Unpickling

It is the process of retrieving the data from the pickle. To perform unpickling, we will use pickle.load(file).

Example of Unpickling:

import pickle
Unpickle_Data = open ("picklefile.txt", "rb")
Text_Data = pickle.load(Unpickle_Data)
print(Text_Data)

Python console shows the list object read from file

['1005a', '1001b', '1003c', '1002d']

Pickle Protocols

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

protocol version 0: It is human - readable, original protocol.It was called Text mode.

Example 1(a):

import pickle
pickle.dumps(345786, protocol=0)

OUTPUT

b'I345786\n.'

Example 1(b):

pickle.dumps(74.98,protocol=0)

OUTPUT

b'F74.98\n.'

Example 1(c):

pickle.dumps((13,39,65), protocol=0)

OUTPUT

b'(I13\nI39\nI65\ntp0\n.'

Example 1(d):

pickle.dumps(['abc',89,9.8], protocol=0)

OUTPUT

b'(lp0\nVabc\np1\naI89\naF9.8\na.'

protocol version 1: It is old binary format.It was called "binary mode"
Example:

import pickle
pickle.dumps(2**4, protocol=1)

OUTPUT

b'K\x10.'

protocol version 2:It was introduced in python 2.3.It provides much more efficient pickling of new-style classes.

Example 1(a):

class Search:
    pass
w=pickle.dumps(Search(),protocol=2)
print(w)

OUTPUT

b'\x80\x02c__main__\nSearch\nq\x00)\x81q\x01.'

Example 1(b):

import pickletools
pickletools.dis(w)

OUTPUT

0: \x80 PROTO      2
2: c    GLOBAL     '__main__ Search'
19: q    BINPUT     0
21: )    EMPTY_TUPLE
22: \x81 NEWOBJ
23: q    BINPUT     1
25: .    STOP
highest protocol among opcodes = 2

protocol version 3: It was introduced in python 3.0.It has explicit support for bytes objects. It cannot be unpickled by Python 2.x

Example:

s=pickle.dumps(b'DATA SCIENCE',protocol=3)
print(s)
pickletools.dis(s)

OUTPUT

b'\x80\x03C\x0cDATA SCIENCEq\x00.'
    0: \x80 PROTO      3
    2: C    SHORT_BINBYTES b'DATA SCIENCE'
   16: q    BINPUT     0
   18: .    STOP
highest protocol among opcodes = 3

protocol version 4:It was introduced in python 3.4 It adds support for very large objects,pickling more kinds of objects, and some data format optimizations.It supports for Unicode.

Example:

s=pickle.dumps('@#$%',protocol=4)
print(s)
pickletools.dis(s)

OUTPUT

b'\x80\x04\x95\x08\x00\x00\x00\x00\x00\x00\x00\x8c\x04@#$%\x94.'
    0: \x80 PROTO      4
    2: \x95 FRAME      8
   11: \x8c SHORT_BINUNICODE '@#$%'
   17: \x94 MEMOIZE    (as 0)
   18: .    STOP
highest protocol among opcodes = 4

protocol version 5: It was introduced in python 3.8..New pickle protocol 5 can send extra data METADATA needed for out-of-band data buffers.There is a PickleBuffer to return out-of-band data buffers.It reduces unnecessary memory copies.

  • Pickle protocol opcodes never changes. Only the new ones are introduced.This make sure old pickles continue to be readable forever.
  • If older unpickler tries to read a pickle generated by newer protocol, it will either work well or Explicitly give you an error.

To know highest and default protocol version of your Python installation use following constants defined in pickle module

  1. pickle.HIGHEST_PROTOCOL
    This is an integer value representing the highest protocol for your Python version.
import pickle
pickle.HIGHEST_PROTOCOL

OUTPUT:

4
  1. pickle.DEFAULT_PROTOCOL

This is an integer value representing the default protocol used for pickling whose value may be less than the value of highest protocol.

import pickle
pickle.DEFAULT_PROTOCOL

OUTPUT:

3

python_pickle-1

What can be pickled and unpickeled?

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • classes functions( both built-in and user-defined) defined at the top level of a module(user def, not lambda)
  • instances of such classes whose dict or the result of calling getstate() is picklable .
  • Any other object is not picklable, and is called unpicklable.
  • Note: Functions content is not pickled
  • only the function's name is pickled, along with the name of the module the function is defined in.
  • functions are pickled by name reference, not by value .This means you cannot pickle lambda functions.

Pickle Exceptions

There are 3 primary exceptions that the module defines, namely:
pickle.PickleError
This is just the base class for the other exceptions. This inherits Exception
pickle.PicklingError
Raised when an unpicklable object is encountered.
pickle.UnpicklingError
Raised during unpickling of an object, if there is any problem (such as data corruption, access violation, etc).

Advantage

  • The advantage of using pickle is that,it can serialize pretty much any Python object, without having to add any extra code.
  • It will only write out any single object once, making it effective to store recursive structures like graphs.

Disadvantage

  • Pickle is both slower and produces larger serialized values than most of the alternatives.
  • Another reason not to use pickle is that unpickling malicious data can cause security issues, including arbitrary code execution.

Keep in mind

  • Pickle is a binary serialization format,not easily readable by human and python-specific.
  • The dump() functions of pickle module perform pickling and load() functions perform unpickling of Python data.
  • The pickle.dump() requires two argument,first the file object and second argument is protocol. The pickle.load() takes one file object as an argument
  • The file must have 'write and binary' mode enabled.

Conclusion

Pickle is an important tool for data scientists. Data processing and training machine learning models can take a long time, and it is useful to save checkpoints. The pickle module, which can be used to serialize/deserialize Python objects to/from files. It is a quick and easy way to transfer and store Python objects, which helps programmers to store data easily and quickly for data transfer.

Pandas also has to_pickle and read_pickle methods.