Underrated Pandas functions

Mar 10 ^edited

The aim of this page is to highlight some underrated python function from the one of the most used libraries: Pandas.
The idea is to save you some time, preventing having to reinvent the wheel, and maybe help make cleaner, shorter and more concise code. The different functions are presented by broad category, and accompanied by application examples and eventually combinations that you could add to your python function toolbox.

Underrated Pandas functions

crosstab

The crosstab function is powerful tool from the Pandas library, for creating cross-tabulation tables from a Dataframe.It allows you to easily summarize and visualize the relationships between categorical variables revealing patterns and frequencies at a glance. Just like your Excel Pivot table, specify what you want to see in columns,rows and which measure should be displayed with a given aggregation function (ex:sum).


# Python code example: given a Sales data sheet stating the number of sales per quarter 
import panda as pd
data={'Product':['Ref1','Ref2','Ref3','Ref4','Ref3','Ref2','Ref2','Ref3'],
'Quarter':['Q1','Q1','Q2','Q1','Q2','Q3','Q2','Q1'],
'Level':[100,200,300,200,500,200,150,225]}
df = pd.DataFrame(data)
# display original data in df
print('Display DataFrame df data')
print(df)

corsstab=pd.crosstab(index=df['Product'],columns=df['Quarter'], values=df['Level'],aggfunc='sum')
# display crosstab result
print('Display DataFrame crosstab data')
print(crosstab)

Output:



Display DataFrame df data

  Product Quarter  Level

0    Ref1      Q1    100

1    Ref2      Q1    200

2    Ref3      Q2    300

3    Ref4      Q1    200

4    Ref3      Q2    500

5    Ref2      Q3    200

6    Ref2      Q2    150

7    Ref3      Q1    225




Display DataFrame crosstab data
Quarter     Q1     Q2     Q3

Product                     

Ref1     100.0    NaN    NaN

Ref2     200.0  150.0  200.0

Ref3     225.0  800.0    NaN

Ref4     200.0    NaN    NaN

To learn more about this function, please check the official documentation Here.

melt

The melt in Pandas, is used to transform a structured dataframe by melting it into a long-form format. In short it transposes a set of columns into rows, with a new column holding the variable names and another the column. It helps reducing the number of columns and this data reshaping can simplify the analysis.



# Python code example: given a Sales data sheet stating the number of sales per quarter and region (N,S,W,E)
import panda as pd
data={'Product':['Ref1','Ref2','Ref3','Ref4','Ref3','Ref2','Ref2','Ref3'],
'Q1':[100,200,0,200,0,0,0,225],
'Q2':[0,0,300,0,500,0,150,0],
'Q3':[0,0,0,0,0,200,0,0],
'Region':['S','N','W','S','E','W','N','S']}

df = pd.DataFrame(data)
# display original data in df
print('Display DataFrame df data')
print(df)

melt=df.melt(id_vars=['Product','Region'],value_vars=['Q1','Q2','Q3'])
# display original data in df
print('Display melt data, with quarters displayed in one column')
print(melt)

Output:


Display DataFrame df data
  Product   Q1   Q2   Q3 Region
0    Ref1  100    0    0      S
1    Ref2  200    0    0      N
2    Ref3    0  300    0      W
3    Ref4  200    0    0      S
4    Ref3    0  500    0      E
5    Ref2    0    0  200      W
6    Ref2    0  150    0      S
7    Ref3  225    0    0      S

Display melt data, with quarters displayed in one column
   Product Region variable  value
0     Ref1      S       Q1    100
1     Ref2      N       Q1    200
2     Ref3      W       Q1      0
3     Ref4      S       Q1    200
4     Ref3      E       Q1      0
5     Ref2      W       Q1      0
6     Ref2      S       Q1      0
7     Ref3      S       Q1    225
8     Ref1      S       Q2      0
9     Ref2      N       Q2      0
10    Ref3      W       Q2    300
11    Ref4      S       Q2      0
12    Ref3      E       Q2    500
13    Ref2      W       Q2      0
14    Ref2      S       Q2    150
15    Ref3      S       Q2      0
16    Ref1      S       Q3      0
17    Ref2      N       Q3      0
18    Ref3      W       Q3      0
19    Ref4      S       Q3      0
20    Ref3      E       Q3      0
21    Ref2      W       Q3    200
22    Ref2      S       Q3      0
23    Ref3      S       Q3      0

To learn more about this function, please check the official documentation Here.

Underrated Pandas functions