The pivot table is a powerful tool to summarize and present data in Excel. However, in Python, Pandas has a function which allows you to quickly convert a DataFrame to a pivot table.

This function is very useful but sometimes it can be tricky to remember how to use it to get the data formatted in a way you need.

Read in the data

import pandas as pd
import numpy as np

Read in the sales funnel data into DataFrame.

df = pd.read_excel("../sales-funnel.xlsx")
df.head()

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

For convenience sake, I define the status column as a category and set the order we’d like to view. This isn’t strictly required but helps us keep the order we want as we work through analyzing the data.

df["Status"] = df["Status"].astype("category")
df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace=True)

Pivot the data

As we build up the pivot table, I think it’s easiest to take it one step at a time. Add items one at a time and check each step to verify you are getting the results you expect.

The simplest pivot table must have a dataframe and an index.

pd.pivot_table(df,index=["Name"])

	Account	Price	Quantity
Name
Barton LLC	740150.0	35000.0	1.000000
Fritsch, Russel and Anderson	737550.0	35000.0	1.000000
Herman LLC	141962.0	65000.0	2.000000
Jerde-Hilpert	412290.0	5000.0	2.000000
Kassulke, Ondricka and Metz	307599.0	7000.0	3.000000
Keeling LLC	688981.0	100000.0	5.000000
Kiehn-Spinka	146832.0	65000.0	2.000000
Koepp Ltd	729833.0	35000.0	2.000000
Kulas Inc	218895.0	25000.0	1.500000
Purdy-Kunde	163416.0	30000.0	1.000000
Stokes LLC	239344.0	7500.0	1.000000
Trantow-Barrows	714466.0	15000.0	1.333333

You can have multiple indexes as well. In fact, most of the pivot_table args can take multiple values via a list.

pd.pivot_table(df,index=["Name","Rep","Manager"])

			Account	Price	Quantity
Name	Rep	Manager
Barton LLC	John Smith	Debra Henley	740150.0	35000.0	1.000000
Fritsch, Russel and Anderson	Craig Booker	Debra Henley	737550.0	35000.0	1.000000
Herman LLC	Cedric Moss	Fred Anderson	141962.0	65000.0	2.000000
Jerde-Hilpert	John Smith	Debra Henley	412290.0	5000.0	2.000000
Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	307599.0	7000.0	3.000000
Keeling LLC	Wendy Yule	Fred Anderson	688981.0	100000.0	5.000000
Kiehn-Spinka	Daniel Hilton	Debra Henley	146832.0	65000.0	2.000000
Koepp Ltd	Wendy Yule	Fred Anderson	729833.0	35000.0	2.000000
Kulas Inc	Daniel Hilton	Debra Henley	218895.0	25000.0	1.500000
Purdy-Kunde	Cedric Moss	Fred Anderson	163416.0	30000.0	1.000000
Stokes LLC	Cedric Moss	Fred Anderson	239344.0	7500.0	1.000000
Trantow-Barrows	Craig Booker	Debra Henley	714466.0	15000.0	1.333333

This is interesting but not particularly useful. What we probably want to do is look at this by Manager and Director. It’s easy enough to do by changing the index.

pd.pivot_table(df,index=["Manager","Rep"])

		Account	Price	Quantity
Manager	Rep
Debra Henley	Craig Booker	720237.0	20000.000000	1.250000
	Daniel Hilton	194874.0	38333.333333	1.666667
	John Smith	576220.0	20000.000000	1.500000
Fred Anderson	Cedric Moss	196016.5	27500.000000	1.250000
Fred Anderson	Wendy Yule	614061.5	44250.000000	3.000000

Now we start to get a glimpse of what a pivot table can do for us.

For this purpose, the Account and Quantity columns aren’t really useful. Let’s remove it by explicitly defining the columns we care about using the values field.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"])

		Price
Manager	Rep
Debra Henley	Craig Booker	20000
	Daniel Hilton	38333
	John Smith	20000
Fred Anderson	Cedric Moss	27500
Fred Anderson	Wendy Yule	44250

The Price column automatically averages the data but we can do a count or a sum. Adding them is simple using aggfunc.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc=np.sum)

		Price
Manager	Rep
Debra Henley	Craig Booker	80000
	Daniel Hilton	115000
	John Smith	40000
Fred Anderson	Cedric Moss	110000
Fred Anderson	Wendy Yule	177000

aggfunc can take a list of functions. Let’s try a mean using the numpy functions and len to get a count.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc=[np.mean,len])

		mean	len
		Price	Price
Manager	Rep
Debra Henley	Craig Booker	20000	4
	Daniel Hilton	38333	3
	John Smith	20000	2
Fred Anderson	Cedric Moss	27500	4
Fred Anderson	Wendy Yule	44250	4

If we want to see sales broken down by the Products, the columns variable allows us to define one or more columns.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],
               columns=["Product"],aggfunc=[np.sum])

		sum
		Price
	Product	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000.0	5000.0	NaN	10000.0
	Daniel Hilton	105000.0	NaN	NaN	10000.0
	John Smith	35000.0	5000.0	NaN	NaN
Fred Anderson	Cedric Moss	95000.0	5000.0	NaN	10000.0
Fred Anderson	Wendy Yule	165000.0	7000.0	5000.0	NaN

The NaN’s are a bit distracting. If we want to remove them, we could use fill_value to set them to 0.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],
               columns=["Product"],aggfunc=[np.sum],fill_value=0)

		sum
		Price
	Product	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000
	Daniel Hilton	105000	0	0	10000
	John Smith	35000	5000	0	0
Fred Anderson	Cedric Moss	95000	5000	0	10000
Fred Anderson	Wendy Yule	165000	7000	5000	0

It would be useful to add the quantity as well. Add Quantity to the values list.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
               columns=["Product"],aggfunc=[np.sum],fill_value=0)

		sum
		Price				Quantity
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000	2	2	0	1
	Daniel Hilton	105000	0	0	10000	4	0	0	1
	John Smith	35000	5000	0	0	1	2	0	0
Fred Anderson	Cedric Moss	95000	5000	0	10000	3	1	0	1
Fred Anderson	Wendy Yule	165000	7000	5000	0	7	3	2	0

What’s interesting is that you can move items to the index to get a different visual representation. We can add the Products to the index.

pd.pivot_table(df,index=["Manager","Rep","Product"],
               values=["Price","Quantity"],aggfunc=[np.sum],fill_value=0)

			sum
			Price	Quantity
Manager	Rep	Product
Debra Henley	Craig Booker	CPU	65000	2
		Maintenance	5000	2
		Software	10000	1
	Daniel Hilton	CPU	105000	4
	Daniel Hilton	Software	10000	1
	John Smith	CPU	35000	1
	John Smith	Maintenance	5000	2
Fred Anderson	Cedric Moss	CPU	95000	3
		Maintenance	5000	1
		Software	10000	1
	Wendy Yule	CPU	165000	7
		Maintenance	7000	3
		Monitor	5000	2

For this data set, this representation makes more sense. Now, what if I want to see some totals? margins=True does that for us.

pd.pivot_table(df,index=["Manager","Rep","Product"],
               values=["Price","Quantity"],
               aggfunc=[np.sum,np.mean],fill_value=0,margins=True)

			sum		mean
			Price	Quantity	Price	Quantity
Manager	Rep	Product
Debra Henley	Craig Booker	CPU	65000.0	2.0	32500.000000	1.000000
		Maintenance	5000.0	2.0	5000.000000	2.000000
		Software	10000.0	1.0	10000.000000	1.000000
	Daniel Hilton	CPU	105000.0	4.0	52500.000000	2.000000
	Daniel Hilton	Software	10000.0	1.0	10000.000000	1.000000
	John Smith	CPU	35000.0	1.0	35000.000000	1.000000
	John Smith	Maintenance	5000.0	2.0	5000.000000	2.000000
Fred Anderson	Cedric Moss	CPU	95000.0	3.0	47500.000000	1.500000
		Maintenance	5000.0	1.0	5000.000000	1.000000
		Software	10000.0	1.0	10000.000000	1.000000
	Wendy Yule	CPU	165000.0	7.0	82500.000000	3.500000
		Maintenance	7000.0	3.0	7000.000000	3.000000
		Monitor	5000.0	2.0	5000.000000	2.000000
All			522000.0	30.0	30705.882353	1.764706

Let’s move the analysis up a level and look at our pipeline at the manager level. Notice how the Status is ordered based on our earlier category definition.

pd.pivot_table(df,index=["Manager","Status"],values=["Price"],
               aggfunc=[np.sum],fill_value=0,margins=True)

		sum
		Price
Manager	Status
Debra Henley	won	65000.0
	pending	50000.0
	presented	50000.0
	declined	70000.0
Fred Anderson	won	172000.0
	pending	5000.0
	presented	45000.0
	declined	65000.0
All		522000.0

A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select.

pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],values=["Quantity","Price"],
               aggfunc={"Quantity":len,"Price":np.sum},fill_value=0)

		Price				Quantity
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Status
Debra Henley	won	65000	0	0	0	1	0	0	0
	pending	40000	10000	0	0	1	2	0	0
	presented	30000	0	0	20000	1	0	0	2
	declined	70000	0	0	0	2	0	0	0
Fred Anderson	won	165000	7000	0	0	2	1	0	0
	pending	0	5000	0	0	0	1	0	0
	presented	30000	0	5000	10000	1	0	1	1
	declined	65000	0	0	0	1	0	0	0

You can provide a list of aggfunctions to apply to each value too:

table = pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],values=["Quantity","Price"],
               aggfunc={"Quantity":len,"Price":[np.sum,np.mean]},fill_value=0)
table

		Price								Quantity
		mean				sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Status
Debra Henley	won	65000	0	0	0	65000	0	0	0	1	0	0	0
	pending	40000	5000	0	0	40000	10000	0	0	1	2	0	0
	presented	30000	0	0	10000	30000	0	0	20000	1	0	0	2
	declined	35000	0	0	0	70000	0	0	0	2	0	0	0
Fred Anderson	won	82500	7000	0	0	165000	7000	0	0	2	1	0	0
	pending	0	5000	0	0	0	5000	0	0	0	1	0	0
	presented	30000	0	5000	10000	30000	0	5000	10000	1	0	1	1
	declined	65000	0	0	0	65000	0	0	0	1	0	0	0

It can look daunting to try to pull this all together at once but as soon as you start playing with the data and slowly add the items, you can get a clear understanding for how it works.

Pivot Table Filtering

Once you have generated your data, it is in a DataFrame so you can filter on it using your normal DataFrame functions.

table.query('Manager == ["Debra Henley"]')

		Price								Quantity
		mean				sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Status
Debra Henley	won	65000	0	0	0	65000	0	0	0	1	0	0	0
	pending	40000	5000	0	0	40000	10000	0	0	1	2	0	0
	presented	30000	0	0	10000	30000	0	0	20000	1	0	0	2
	declined	35000	0	0	0	70000	0	0	0	2	0	0	0

table.query('Status == ["pending","won"]')

		Price								Quantity
		mean				sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Status
Debra Henley	won	65000	0	0	0	65000	0	0	0	1	0	0	0
Debra Henley	pending	40000	5000	0	0	40000	10000	0	0	1	2	0	0
Fred Anderson	won	82500	7000	0	0	165000	7000	0	0	2	1	0	0
Fred Anderson	pending	0	5000	0	0	0	5000	0	0	0	1	0	0

This is a brief walk-through on how to use pivot tables on your data sets.

Kaichen's Blog

Always like this.

Replace data manipulation in Excel with Python

Read in the data

Pivot the data

Pivot Table Filtering