Use Series in Pandas


Reading time: 30 minutes | Coding time: 10 minutes

In this post, we will look at Series, which is present in the pandas library. It is very popular among python users due to its vast functionality and usability. We will dive straight to what series is, what are its uses along with some basic coding implementation.

Capture-2

Introduction

Before understanding what Series is, we must look at the pandas first. As stated on pandas homepage-
Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is very easy to install pandas with simple pip command typed in command prompt(cmd)-

pip install pandas

You are most likely to encounter pandas whenever you're dealing with any data analysis or machine learning using python. Now, after having a good intuition about pandas lets look at the Series.

Pandas Series is a one-dimensional labeled array capable of holding data of any type.
The data types can be integer, string, float, python objects and many more. Series is used for storing some information.

In simple words, Series can be inferred as a column in an excel sheet. Below, we can see we have different columns like Name, Age and City,. Each of them represents a Series. And, on the left, we can see some numbers which are index. By default, these are number but they can be of other data types also.

series

The axis labels are collectively called index. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

Now, we will look at the various parameters of Series constructor and some basic operations on Series.

  • data - contains the data to be stored in Series | array-like, Iterable, dict, or scalar value
  • index - values for index | must be hashable & of same length as data | Default(if no index passed)- np.arange(n) | Non-unique index values also allowed | array-like or Index (1d)
  • dtype - data type for the output Series | str, numpy.dtype, or ExtensionDtype, Optional(inferred from data)
  • copy - Copy the input data | bool, Default-False

1. Creating an empty series

import pandas as pd
import numpy as np
ser = pd.Series()
print(ser)

OUTPUT:

Series([], dtype: float64)

Since, we've not passed any data, the output is an empty Series.

2. Creating series using numpy array

data = np.array(['a','k','s','h','a','t'])
ser = pd.Series(data)
print(ser)

OUTPUT:

0    a
1    k
2    s
3    h
4    a
5    t
dtype: object

Note, that the dtype in output is object since we have passed string as our data. And, index is automatically determined by pandas.

3. Creating series using list

data = [10,15,20,25]
ser = pd.Series(data)
print(ser)

OUTPUT:

0    10
1    15
2    20
3    25
dtype: int64

Note, that the dtype in output is int64 since we have integers as our data.

4. Creating series using dictionary

data = {'a' : 2, 'b' : 4, 'c' : 6}
ser = pd.Series(data)
print(ser)

OUTPUT:

a    2
b    4
c    6
dtype: int64

5. describe() method of Series

data = [-1,10,15,20,25,100]
ser = pd.Series(data)
print(ser.describe())

OUTPUT:

count      6.000000
mean      28.166667
std       36.306565
min       -1.000000
25%       11.250000
50%       17.500000
75%       23.750000
max      100.000000
dtype: float64

describe() method returns the descriptive statistics of our data. In case of object data, it will return-

count - total number of values
unique - number of unique values
top - most frequent value
freq - frequency of most frequent value

6. Accessing element of Series

data = list(['akshat','bob','alexa','akansha'])
ser = pd.Series(data)
print(ser[2])

OUTPUT-

alexa

Code:

print(ser[0:2])

OUTPUT-

0    akshat
1       bob
dtype: object

7. Accessing element using loc and iloc

data = list(['akshat','bob','alexa','akansha'])
ser = pd.Series(data,index=[100,200,300,400])
print(ser.loc[400])  

OUTPUT-

akansha

Code:

print(ser.iloc[:3]) # get first 3 elements

OUTPUT-

100    akshat
200       bob
300     alexa
dtype: object

ser.loc[400] returns element with index 400 and ser.iloc[:3] returns first 3 elements

8. Checking for null values, getting maximum and minimum and index of maximum element

data = [-1,10,15,100,20,25,np.nan]
ser = pd.Series(data)

print('null value present:',ser.isnull().any())

ser.dropna(inplace=True)

print('maximum value is   ',ser.max())
print('minimum value is   ',ser.min())
print('index with maximum value ', ser.argmax())

OUTPUT-

null value present: True
maximum value is    100.0
minimum value is    -1.0
index with maximum value  3

9. Arithmetic operations

data1 = [0,2,4,6,8]
data2 = [1,3,5,7,9]

ser1 = pd.Series(data1)
ser2 = pd.Series(data2)

print('adding two series\n')
print(ser1+ser2)

OUTPUT-

adding two series

0     1
1     5
2     9
3    13
4    17
dtype: int64

Similarly, we can perform subtraction, multiplication and division.

10. Using apply() method to call a custom query function

Suppose, we want to get all odd numbers from a given series. Then, one possible solution is to use apply() method which requires you to pass some function that will be applied to every element of Series.

data = [4,16,10,15,100,20,25]
ser = pd.Series(data)
print(ser[ser.apply(lambda x : x%2!=0)])

OUTPUT-

3    15
6    25
dtype: int64

Here ser.apply(lambda x : x%2!=0) will return a boolean array-

0    False
1    False
2    False
3     True
4    False
5    False
6     True
dtype: bool<br>

ser[ser.apply(lambda x : x%2!=0)] will return those elements where it will find True

Conclusion

In this post, we looked at pandas Series and some of its basic operations.
There exist, a whole bunch of attributes and methods of Series which can't be covered in a single post, so you can read more through the pandas official documentation :)

Although, Series can be not only be studied individually, since in real life, the data consists of multiple columns/attributes which are stored as pandas DataFrame, so multiple columns are also studied together. A DataFrame is considered as a collection of Series.

df-1