Sort by row and column in Pandas DataFrame
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
Table of Contents
- Pandas DataFrame
- Sort DataFrame rows based on index
- Sort DataFrame rows based on a single column
- Sort DataFrame rows based on a multiple columns
- Sort DataFrame rows based on columns in Descending Order
- Sort DataFrame columns based on index
- Sort columns of a DataFrame based on a single row
- Sort columns of a DataFrame in Descending Order based on a single row
- Sort columns of a DataFrame based on a multiple rows
- N largest value
- N smallest value
- Conclusion
Pandas DataFrame
DataFrame is the two-dimensional data structure of Pandas. It consists of labeled rows and columns.
Let’s create a simple one to understand the basic features of a DataFrame. To create a DataFrame, we can pass a Python dictionary to its constructor.
Code:
import pandas as pd
df = pd.DataFrame({
"Name": ["Arjuna", "Bhishma", "Krishna", "Karna"],
"Age": [30, 100, 80, 35]
})
print(df)
Age | Name | |
---|---|---|
0 | 30 | Arjuna |
1 | 100 | Bhishma |
2 | 80 | Krishna |
3 | 35 | Karna |
Dictionary keys become the column names and the values become the data stored in the DataFrame.In the above example we have a DataFrame with two columns and four rows.
Sort DataFrame rows based on index
If we want to sort rows of a DataFrame object by index, just call sort_index().
Code:
import pandas as pd
df = pd.DataFrame([(2,3),(4,2),(1,8),(9,1)], index=[3,1,0,2], columns = ['c1','c2'])
sorted_df=df.sort_index()
print(sorted_df)
In the above example, the original DataFrame is sorted by index.The sorted DataFrame now looks like this-
c1 | c2 | |
---|---|---|
0 | 1 | 8 |
1 | 4 | 2 |
2 | 9 | 1 |
3 | 2 | 3 |
Sort DataFrame rows based on a single column
We can sort all the rows in DataFrame based on a single column, that is passing the column name in by argument.
Code:
k={"a" : [4,1,1,2], "b":[1, 4, 2, 6], "c":[3,1,6,5]}
df=pd.DataFrame(k)
sorted_df=df.sort_values(by="a")
print(sorted_df)
sort_values is used to sort a dataframe by its column or columns
In the above example, rows of the DataFrame are sorted based on the dictionary key or column name (a). The sorted DataFrame now looks like this-
a | b | c | |
---|---|---|---|
1 | 1 | 4 | 1 |
2 | 1 | 2 | 6 |
3 | 2 | 6 | 5 |
0 | 4 | 1 | 3 |
Sort DataFrame rows based on a multiple columns
what if we want to sort in such a way values are same for one column then can we use 2nd column for sorting those rows ?
We can sort all the rows in a DataFrame based on multiple columns, that is passing the column names in a list in by argument.
Code:
k={"a" : [4,1,1,2], "b":[1, 4, 2, 6], "c":[3,1,6,5]}
df=pd.DataFrame(k)
sorted_df=df.sort_values(by=["a","b"])
print(sorted_df)
In the above example, rows of the DataFrame are sorted based on the dictionary keys or column names (a,b). The sorted DataFrame now looks like this-
a | b | c | |
---|---|---|---|
2 | 1 | 2 | 6 |
1 | 1 | 4 | 1 |
3 | 2 | 6 | 5 |
0 | 4 | 1 | 3 |
Sort DataFrame rows based on columns in descending order
We can sort all the rows in a DataFrame in descending order,by passing the argument ascending with value False along with by argument.
Code:
k={"a" : [4,1,1,2], "b":[1, 4, 2, 6], "c":[3,1,6,5]}
df=pd.DataFrame(k)
sorted_df=df.sort_values(by="a",ascending=False)
print(sorted_df)
In the above example, rows of the DataFrame are sorted in descending order based on the dictionary key or column name (a). The sorted DataFrame now looks like this-
a | b | c | |
---|---|---|---|
0 | 4 | 1 | 3 |
3 | 2 | 6 | 5 |
1 | 1 | 4 | 1 |
2 | 1 | 2 | 6 |
Sort DataFrame columns based on index
If we want to sort rows of a DataFrame object by index, just call sort_index() and pass the argument axis=1.
Code:
import pandas as pd
df = pd.DataFrame([(2,3),(4,2),(1,8),(9,1)], index=[3,1,0,2], columns = ['c2','c1'])
sorted_df=df.sort_index(axis=1)
print(sorted_df)
In the above example, the original DataFrame is sorted by index from columns.The sorted DataFrame now looks like this-
c1 | c2 | |
---|---|---|
3 | 3 | 2 |
1 | 2 | 4 |
0 | 8 | 1 |
2 | 1 | 9 |
Sort columns of a DataFrame based on a single row
We can sort all the columns of a DataFrame using a single row, by passing the row index labels in by argument and axis=1.
Code:
matrix=[(5,4,3,2),(1,4,2,6),(3,1,6,5)]
df = pd.DataFrame(matrix, index=list('abc'))
sorted_df=df.sort_values(by='b',axis=1)
print(sorted_df)
In the above example, all the columns in a DataFrame are sorted based on a single row with index label 'b'. The sorted DataFrame now looks like this-
0 | 2 | 1 | 3 | |
---|---|---|---|---|
a | 5 | 3 | 4 | 2 |
b | 1 | 2 | 4 | 6 |
c | 3 | 6 | 1 | 5 |
Sort columns of a DataFrame in descending order based on a single row
We can sort all the columns of a DataFrame using a single row in descending order, by passing the row index labels in by argument , axis=1 and ascending with value False.
Code:
matrix=[(5,4,3,2),(1,4,2,6),(3,1,6,5)]
df = pd.DataFrame(matrix, index=list('abc'))
sorted_df=df.sort_values(by='b',axis=1,ascending=False)
print(sorted_df)
In the above example, all the columns in a DataFrame are sorted in descending order based on a single row with index label 'b'. The sorted DataFrame now looks like this-
3 | 1 | 2 | 0 | |
---|---|---|---|---|
a | 2 | 4 | 3 | 5 |
b | 6 | 4 | 2 | 1 |
c | 5 | 1 | 6 | 3 |
Sort columns of a DataFrame based on a multiple rows
We can sort all the columns of a DataFrame using multiple rows, by passing the row index labels in by argument and axis=1.
Code:
matrix=[(5,4,3,2),(1,4,2,6),(3,1,6,5)]
df = pd.DataFrame(matrix, index=list('abc'))
sorted_df=df.sort_values(by=['a','b'],axis=1)
print(sorted_df)
In the above example, all the columns in a DataFrame are sorted based on a multiple rows with index labels 'a' and 'b'. The sorted DataFrame now looks like this-
3 | 2 | 1 | 0 | |
---|---|---|---|---|
a | 2 | 3 | 4 | 5 |
b | 6 | 2 | 4 | 1 |
c | 5 | 6 | 1 | 3 |
N largest value
DataFrame provide some functions to get the largest n values from the data.
import pandas as pd
d = {"a": [1,2,3,4,5], "b":[2,3,4,5,6]}
df = pd.DataFrame(d)
result = df.nlargest(3, "a")
print(result)
The three largest value based on column 'a' of this DataFrame.
a | b | |
---|---|---|
4 | 5 | 6 |
3 | 4 | 5 |
2 | 3 | 4 |
N smallest value
DataFrame provide some functions to get the smallest n values from the data.
import pandas as pd
d = {"a": [1,2,3,4,5], "b":[2,3,4,5,6]}
df = pd.DataFrame(d)
result = df.nlargest(3, "a")
print(result)
The two smallest value based on column 'b' of this DataFrame.
a | b | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 3 |
Conclusion
In this article at OpenGenus, we learned how to sort columns and rows of a DataFrame objects in Pandas using sort_index and sort_values.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.