Active Product Sales Analysis using Matplotlib in Python
Last Updated :
10 Sep, 2024
Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients need in order to increase their chances of sales. The huge datasets handed to us can be properly analyzed to find out what time of day has the highest user activity in terms of transactions.
In this post, We will use Python Pandas and Matplotlib to analyze the insight of the dataset. We can use the column Transaction Date, in this case, to glean useful insights on the busiest time (hour) of the day. You can access the entire dataset here.
Stepwise Implementation
Step 1:
First, We need to create a Dataframe of the dataset, and even before that certain libraries have to be imported.
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Order_Details = pd.read_csv('Order_details(masked).csv')
Output:
Step 2:
Create a new column called Time that has the DateTime format after converting the Transaction Date column into it. The DateTime format, which has the pattern YYYY-MM-DD HH:MM:SS, can be customized however you choose. Here we're more interested in obtaining hours, so we can have an Hour column by using an in-built function for the same:
Python
# here we have taken Transaction
# date column
Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date'])
# After that we extracted hour
# from Transaction date column
Order_Details['Hour'] = (Order_Details['Time']).dt.hour
Step 3:
We then require the "n" busiest hours. For that, we get the first "n" entries in a list containing the occurrence rates of the hours when the transaction took place. To further simplify the manipulation of the provided data in Python, we may utilize value counts for frequencies and tolist() to convert to list format. We are also compiling a list of the associated index values.
Python
# n =24 in this case, can be modified
# as per need to see top 'n' busiest hours
timemost1 = Order_Details['Hour'].value_counts().index.tolist()[:24]
timemost2 = Order_Details['Hour'].value_counts().values.tolist()[:24]
Step 4:
Finally, we stack the indices (hour) and frequencies together to yield the final result.
Python
tmost = np.column_stack((timemost1,timemost2))
print(" Hour Of Day" + "\t" + "Cumulative Number of Purchases \n")
print('\n'.join('\t\t'.join(map(str, row)) for row in tmost))
Step 5:
Before we can create an appropriate data visualization, we must make the list slightly more customizable. To do so, we gather the hourly frequencies and perform the following tasks:
Python
timemost = Order_Details['Hour'].value_counts()
timemost1 = []
for i in range(0,23):
timemost1.append(i)
timemost2 = timemost.sort_index()
timemost2.tolist()
timemost2 = pd.DataFrame(timemost2)
Step 6:
For data visualization, we will proceed with Matplotlib for better comprehensibility, as it is one of the most convenient and commonly used libraries. But, It is up to you to choose any of the pre-existing libraries like Matplotlib, Ggplot, Seaborn, etc., to plot the data graphically.
The commands written below are mainly to ensure that X-axis takes up the values of hours and Y-axis takes up the importance of the number of transactions affected, and also various other aspects of a line chart, including color, font, etc., to name a few.
Python
plt.figure(figsize=(20, 10))
plt.title('Sales Happening Per Hour (Spread Throughout The Week)',
fontdict={'fontname': 'monospace', 'fontsize': 30}, y=1.05)
plt.ylabel("Number Of Purchases Made", fontsize=18, labelpad=20)
plt.xlabel("Hour", fontsize=18, labelpad=20)
plt.plot(timemost1, timemost2, color='m')
plt.grid()
plt.show()
The results are indicative of how sales typically peak in late evening hours prominently, and this data can be incorporated into business decisions to promote a product during that time specifically.
Get the complete notebook link here
Colab Link : click here.
Dataset Link : click here.
Similar Reads
RFM Analysis Analysis Using Python In business analytics one of the easiest ways to understand and categorize customers is through RFM analysis. RFM stands for Recency, Frequency and Monetary value which are three simple ways to look at customer behaviour:Recency: How recently did the customer make a purchase? The more recent, the mo
4 min read
Draw Multiple Y-Axis Scales In Matplotlib Why areMatplotlib is a powerful Python library, with the help of which we can draw bar graphs, charts, plots, scales, and more. In this article, we'll try to draw multiple Y-axis scales in Matplotlib. Why are multiple Y-axis scales important?Multiple Y-axis scales are necessary when plotting dataset
6 min read
Medical Insurance Price Prediction using Machine Learning - Python You must have heard some advertisements regarding medical insurance that promises to help financially in case of any medical emergency. One who purchases this type of insurance has to pay premiums monthly and this premium amount varies vastly depending upon various factors. Medical Insurance Price P
7 min read
Zomato Data Analysis Using Python Understanding customer preferences and restaurant trends is important for making informed business decisions in food industry. In this article, we will analyze Zomatoâs restaurant dataset using Python to find meaningful insights. We aim to answer questions such as:Do more restaurants provide online
3 min read
Loan Eligibility Prediction using Machine Learning Models in Python Have you ever thought about the apps that can predict whether you will get your loan approved or not? In this article, we are going to develop one such model that can predict whether a person will get his/her loan approved or not by using some of the background information of the applicant like the
5 min read