Analyze the data for an independent gaming company for their most recent fantasy game Heroes of Pymoli as a Lead Analyst.
Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like to generate a report that breaks down the game's purchasing data into meaningful insights.
The final report should include each of the following:
- Total Number of Players
#get the count
playerCount = df.SN.nunique()
#create table
playerCount_df = pd.DataFrame()
playerCount_df["Total Players"] = [playerCount]
#print
playerCount_df- Number of Unique Items
- Average Purchase Price
- Total Number of Purchases
- Total Revenue
#get the number
uniqItem = df["Item Name"].nunique()
avePrice = df.Price.mean()
numPurchasing = df["Purchase ID"].count()
totRev = df.Price.sum()
#create summary table
purchasingAnalysis = pd.DataFrame()
purchasingAnalysis["Number of Unique Items"] = [uniqItem]
purchasingAnalysis["Average Price"] = [avePrice]
purchasingAnalysis["Number of Purchases"] = [numPurchasing]
purchasingAnalysis["Total Revenue"] = [totRev]
#format the number
purchasingAnalysis["Average Price"] = purchasingAnalysis["Average Price"].map("${:.2f}".format)
purchasingAnalysis["Total Revenue"] = purchasingAnalysis["Total Revenue"].map("${:.2f}".format)
purchasingAnalysis- Percentage and Count of Male Players
- Percentage and Count of Female Players
- Percentage and Count of Other / Non-Disclosed
people_df = df.groupby(['SN','Gender']).size().reset_index().rename(columns={0:'count'})
#make the columns first
genderCount = people_df.Gender.value_counts()
genderPerc = genderCount / len(people_df)
#make the table
genderTable = pd.DataFrame()
genderTable["Total Count"] = genderCount
genderTable["Percentage of Players"] = genderPerc * 100
#format
genderTable["Percentage of Players"] = genderTable["Percentage of Players"].map("{:.2f}%".format)
#print
genderTable- The below each broken by gender
- Purchase Count
- Average Purchase Price
- Total Purchase Value
- Average Purchase Total per Person by Gender
#get the columns
genderSize = df.groupby("Gender").size()
genderAvgPrice = df.groupby("Gender").Price.mean()
genderTotPrice = df.groupby("Gender").Price.sum()
genderAvgPerPerson = df.groupby("Gender").Price.sum() / genderTable["Total Count"]
#make the table
genderPurchasingTable = pd.DataFrame()
genderPurchasingTable["Purchase Count"] = genderSize
genderPurchasingTable["Average Purchase Price"] = genderAvgPrice
genderPurchasingTable["Total Purchase Value"] = genderTotPrice
genderPurchasingTable["Avg Total Purchase per Person"] = genderAvgPerPerson
#format
genderPurchasingTable["Average Purchase Price"] = genderPurchasingTable["Average Purchase Price"].map("${:,.2f}".format)
genderPurchasingTable["Total Purchase Value"] = genderPurchasingTable["Total Purchase Value"].map("${:,.2f}".format)
genderPurchasingTable["Avg Total Purchase per Person"] = genderPurchasingTable["Avg Total Purchase per Person"].map("${:,.2f}".format)
#print
genderPurchasingTable- The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.)
- Purchase Count
- Average Purchase Price
- Total Purchase Value
- Average Purchase Total per Person by Age Group
# Create bins in which to place values based upon player ages
bins = [0, 9, 14, 19, 24, 29, 34, 39, 50]
# Create labels for these bins
group_labels = [" <10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
# Slice the data and place it into bins
df["Age Group"] = pd.cut(df["Age"], bins, labels=group_labels).astype(str)
df.head()
people_df2 = df.groupby(['SN','Age Group']).size().reset_index().rename(columns={0:'count'})
people_df2
# Create bins in which to place values based upon player ages
bins = [0, 9, 14, 19, 24, 29, 34, 39, 50]
# Create labels for these bins
group_labels = [" <10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
# Slice the data and place it into bins
df["Age Group"] = pd.cut(df["Age"], bins, labels=group_labels).astype(str)
ageGroupTable = people_df2.groupby("Age Group").size().reset_index().rename(columns={0:'Total Count'})
ageGroupTable["Percentage of Players"] = ageGroupTable["Total Count"] / len(people_df2) * 100
#format
ageGroupTable["Percentage of Players"] = ageGroupTable["Percentage of Players"].map("{:.2f}%".format)
ageGroupTable = ageGroupTable.set_index("Age Group")
ageGroupTable
#get the columns
ageSize = df.groupby("Age Group").size()
ageAvgPrice = df.groupby("Age Group").Price.mean()
ageTotPrice = df.groupby("Age Group").Price.sum()
ageAvgPerPerson = df.groupby("Age Group").Price.sum() / ageGroupTable["Total Count"] # NOTE THE AGE COUNT USE HERE - HAVE TO GET UNIQUE PEOPLE
#make the table
agePurchasingTable = pd.DataFrame()
agePurchasingTable["Purchase Count"] = ageSize
agePurchasingTable["Average Purchase Price"] = ageAvgPrice
agePurchasingTable["Total Purchase Value"] = ageTotPrice
agePurchasingTable["Avg Total Purchase per Person"] = ageAvgPerPerson
#format
agePurchasingTable["Average Purchase Price"] = agePurchasingTable["Average Purchase Price"].map("${:,.2f}".format)
agePurchasingTable["Total Purchase Value"] = agePurchasingTable["Total Purchase Value"].map("${:,.2f}".format)
agePurchasingTable["Avg Total Purchase per Person"] = agePurchasingTable["Avg Total Purchase per Person"].map("${:,.2f}".format)
#print
agePurchasingTable
- Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
- SN
- Purchase Count
- Average Purchase Price
- Total Purchase Value
#get the columns
sNSize = df.groupby("SN").size()
sNAvgPrice = df.groupby("SN").Price.mean()
sNTotPrice = df.groupby("SN").Price.sum()
#make the table
sNPurchasingTable = pd.DataFrame()
sNPurchasingTable["Purchase Count"] = sNSize
sNPurchasingTable["Average Purchase Price"] = sNAvgPrice
sNPurchasingTable["Total Purchase Value"] = sNTotPrice
#format
sNPurchasingTable["Average Purchase Price"] = sNPurchasingTable["Average Purchase Price"].map("${:,.2f}".format)
sNPurchasingTable["Total Purchase Value"] = sNPurchasingTable["Total Purchase Value"].map("${:,.2f}".format)
#print
sNPurchasingTable = sNPurchasingTable.sort_values(by="Total Purchase Value", ascending=False)
sNPurchasingTable.head()
- Identify the 5 most popular items by purchase count, then list (in a table):
- Item ID
- Item Name
- Purchase Count
- Item Price
- Total Purchase Value
#get the columns
itemSize = df.groupby(["Item ID", "Item Name"]).size()
itemAvgPrice = df.groupby(["Item ID", "Item Name"]).Price.mean()
itemTotPrice = df.groupby(["Item ID", "Item Name"]).Price.sum()
#make the table
itemPurchasingTable = pd.DataFrame()
itemPurchasingTable["Purchase Count"] = itemSize
itemPurchasingTable["Item Price"] = itemAvgPrice
itemPurchasingTable["Total Purchase Value"] = itemTotPrice
#sort
itemPurchasingTable = itemPurchasingTable.sort_values(by="Purchase Count", ascending=False)
#format
itemPurchasingTable["Item Price"] = itemPurchasingTable["Item Price"].map("${:,.2f}".format)
itemPurchasingTable["Total Purchase Value"] = itemPurchasingTable["Total Purchase Value"].map("${:,.2f}".format)
#print
itemPurchasingTable.head()
- Identify the 5 most profitable items by total purchase value, then list (in a table):
- Item ID
- Item Name
- Purchase Count
- Item Price
- Total Purchase Value
#get the columns
itemSize = df.groupby(["Item ID", "Item Name"]).size()
itemAvgPrice = df.groupby(["Item ID", "Item Name"]).Price.mean()
itemTotPrice = df.groupby(["Item ID", "Item Name"]).Price.sum()
#make the table
itemPurchasingTable = pd.DataFrame()
itemPurchasingTable["Purchase Count"] = itemSize
itemPurchasingTable["Item Price"] = itemAvgPrice
itemPurchasingTable["Total Purchase Value"] = itemTotPrice
#sort
itemPurchasingTable = itemPurchasingTable.sort_values(by="Total Purchase Value", ascending=False)
#format
itemPurchasingTable["Item Price"] = itemPurchasingTable["Item Price"].map("${:,.2f}".format)
itemPurchasingTable["Total Purchase Value"] = itemPurchasingTable["Total Purchase Value"].map("${:,.2f}".format)
#print
itemPurchasingTable.head()










