E-Commerce Dashboard

In this project, I worked on a comprehensive examination of online transactional data, of an e-commerce. I utilized Python for preliminary exploration, SQL for investigation and querying the data, and Power BI for visualization, this work showcases a blend of data analytics and visual representation, with a focus on UI design.

The main goal here is to understand, clean, and structure the dataset through an exploratory data analysis (EDA), as we’ll transition to SQL for more advanced analysis and Power BI for creating an e-commerce dashboard visualization in the subsequent phases of the project.

This project dives into a UK-based e-commerce retailer’s sales data over a year.
The database is available here: Link.

Contents

Exploratory Data Analysis (Python)

I will use the following Python libraries:

  • pandas: A data manipulation and analysis library. I used it for loading the dataset into a DataFrame, a two-dimensional tabular datastructure with labeled axes. It also provides useful functions for preprocessing data, such as converting data types, handling missing values,and grouping data.
  • NumPy: A library for numerical computations.
  • matplotlib: A data visualization library.

1. Importing and Initial Inspection of the Dataset

The first step is always importing the necessary Python libraries and loading the dataset.

The dataset contains 536,350 rows and 8 columns.
The available columns are: TransactionNo, Date, ProductNo, ProductName, Price, Quantity, CustomerNo, and Country.

2. Handling Missing Data

In this step, I check for missing values to prepare the data for the next steps.

The CustomerNo column has 55 missing values, a very small portion of the dataset.
All other columns are complete without any missing values.
I decided to handle the missing values replacing the missing values with “Unkown”.

3. Handline negative values: Splitting the Dataset

As it is described in the Content part in the available description of the database:
Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions.

I think would be useful to split the data into two subsets: cancelled and non-cancelled transactions.

The dataset has been divided into two:
Cancelled Transactions: 8,585 entries
Non-Cancelled Transactions: 527,765 entries

4. Handling Outliers

Now in this next step of the EDA I will look for outliers, using some boxplots to have a clear view of the distribution of the data.

Boxplot for Price:
A majority of the product prices cluster within a relatively low range.
Several data points on the higher end suggest products with significantly higher prices than most, possibly indicating premium items.

Boxplot for Quantity:
Most transactions involve a modest quantity of items.
Data points on the higher end might represent bulk orders or larger purchases.

The products with the highest prices appear to be furniture and home decor items. These items naturally tend to be more expensive.
The list suggests that these aren’t outliers, but genuine premium products.

The quantities are high, but they appear genuine given the nature of the products.
There’s no immediate indication that these are data errors.


With the exploratory data analysis complete, the sales data is ready for further analysis.
The next stages involve using SQL for detailed data analysis and Power BI for clear visual representation.

Data Mining/Querying (SQL)

Now, with SQL, my main goal is to dig deeper into transactional behaviors.

My goal is to inspect sales trends, discover high-performing products, learn about customer segmentation, and uncover trends in cancelled transactions.

Some of the key issues to be addressed include: How have sales trends changed over the months? Which products are frequently purchased? How do cancellations impact overall sales?

1. Creation and Alteration of Tables

Creation

CREATE TABLE sales_data (
TransactionNo VARCHAR(10),
Date DATE,
ProductNo VARCHAR(10),
ProductName TEXT,
Price DECIMAL(10, 2),
Quantity INT,
CustomerNo VARCHAR(10),
Country TEXT);
CREATE TABLE cancelled_sales_data (
TransactionNo VARCHAR(10),
Date DATE,
ProductNo VARCHAR(10),
ProductName TEXT,
Price DECIMAL(10, 2),
Quantity INT,
CustomerNo VARCHAR(10),
Country TEXT);

Alteration: Let’s add some columns

ALTER TABLE sales_data
ADD COLUMN Year INT,
ADD COLUMN Month INT,
ADD COLUMN Day INT,
ADD COLUMN Sales DECIMAL(10,2);

UPDATE sales_data
SET
Year = EXTRACT(YEAR FROM Date),
Month = EXTRACT(MONTH FROM Date),
Day = EXTRACT(DAY FROM Date),
Sales = Price * Quantity;
ALTER TABLE cancelled_sales_data
ADD COLUMN Year INT,
ADD COLUMN Month INT,
ADD COLUMN Day INT,
ADD COLUMN Sales DECIMAL(10,2);

UPDATE cancelled_sales_data
SET
Year = EXTRACT(YEAR FROM Date),
Month = EXTRACT(MONTH FROM Date),
Day = EXTRACT(DAY FROM Date),
Sales = Price * Quantity;

2. Query for Non-Cancelled Transactions

Monthly sales Analysis

Description: This query calculates the total sales for each month.
Question: How much did the store earn in sales each month?
Query:
SELECT Year, Month, SUM(Sales) AS TotalMonthlySales FROM sales_data
GROUP BY Year, Month
ORDER BY Year, Month;

a. Total sales for each month
yearmonthtotalmonthlysales
12018124415415.52
2201914559856.37
3201923335017.18
4201934398401.60
5201943589497.88
6201954578965.08
7201964494730.81
8201974593867.06
9201984758356.02
10201996628303.06
112019107237417.36
122019117861197.12
132019122514949.28

Comments:
The highest sales were in November 2019 with a total of £7,861,197.12.
December 2019 witnessed a significant drop in sales to £2,514,949.28, caused by the fact that in our dataset we have only data until 09/12/2019.

Number of transactions per month

Description: This query calculates the number of transactions that occurred each month.
Question: How many transactions took place each month?
Query:
SELECT Year, Month, COUNT(DISTINCT TransactionNo) AS TotalTransactions FROM sales_data
GROUP BY Year, Month
ORDER BY Year, Month;

b. Number of transactions per month
yearmonthtotaltransactions
12018121552
2201911081
3201921096
4201931442
5201941235
6201951670
7201961528
8201971452
9201981341
10201991818
112019102005
122019112753
13201912817

Comments:
The highest number of transactions occurred in November 2019 with a total of 2753 transactions.
December 2019 had the lowest transactions, which is consistent with the drop in sales, and explained by the presence of data only until 09/12/2019.

Most sold products

Description: This query identifies the top 10 products with the highest quantities sold.
Question: Which products were the most popular among our customers?
Query:
SELECT ProductNo, ProductName, SUM(Quantity) AS TotalQuantitySold FROM sales_data
GROUP BY ProductNo, ProductName
ORDER BY TotalQuantitySold DESC
LIMIT 10;

c. Top 10 most sold products
productnoproductnametotalquantitysold
123843Paper Craft Little Birdie80995
223166Medium Ceramic Top Storage Jar78033
322197Popcorn Holder56921
484077World War 2 Gliders Asstd Designs55047
585099BJumbo Bag Red Retrospot48478
685123ACream Hanging Heart T-Light Holder37956
721212Pack Of 72 Retrospot Cake Cases36515
884879Assorted Colour Bird Ornament36493
923084Rabbit Night Light30788
1022492Mini Paint Set Vintage26633

Comments:
“Paper Craft Little Birdie” was the most sold product with a quantity of 80,995.
This is followed by “Medium Ceramic Top Storage Jar” with a quantity of 78,033.

Products generating the most revenue

Description: This query identifies the products that have generated the highest revenue.
Question: Which products have brought in the most revenue?
Query:
SELECT ProductNo, ProductName, SUM(Sales) AS TotalProductRevenue FROM sales_data
GROUP BY ProductNo, ProductName
ORDER BY TotalProductRevenue DESC
LIMIT 10;

d. Products generating the most revenue
productnoproductnametotalproductrevenue
123843Paper Craft Little Birdie1002718.10
223166Medium Ceramic Top Storage Jar881990.18
322197Popcorn Holder587433.94
484077World War 2 Gliders Asstd Designs569735.39
585123ACream Hanging Heart T-Light Holder484592.69
684879Assorted Colour Bird Ornament421318.74
721212Pack Of 72 Retrospot Cake Cases391485.03
823084Rabbit Night Light329029.89
922423Regency Cakestand 3 Tier307483.85
1085099BJumbo Bag Red Retrospot297205.04

Comments:
“Paper Craft Little Birdie” generated the highest revenue of £1,002,718.10.
This emphasizes its popularity and potential as a best-selling item.

Top Customers by Revenue

Description: This query lists the top 10 customers who have contributed the most in terms of sales revenue.
Question: Who are our top 10 customers by sales amount?
Query:
SELECT CustomerNo, SUM(Sales) AS TotalSpent FROM sales_data
GROUP BY CustomerNo
ORDER BY TotalSpent DESC
LIMIT 10;

e. Customers generating the highest revenue
customernototalspent
114646.02112282.03
216446.01002741.57
314911.0914633.52
412415.0900545.54
518102.0897137.36
617450.0891438.53
712346.0840113.80
814156.0694965.02
913694.0646116.78
1017511.0639006.19

Comments:
The customer with the ID 14646 contributed the most with a total spending of £2,112,282.03.
This indicates potential high-value customers who might benefit from loyalty programs.

Country-wise distribution of customers

Description: This query shows the distribution of customers based on their countries.
Question: How are our customers distributed across different countries?
Query:
SELECT Country, COUNT(DISTINCT CustomerNo) AS NumberOfCustomers FROM sales_data
GROUP BY Country
ORDER BY NumberOfCustomers DESC;

f. Country-wise distribution of customers
countrynumberofcustomers
1United Kingdom4306
2Germany91
3France87
4Spain23
5Belgium22
6Switzerland21
7Portugal18
8EIRE13
9Italy12
10Finland12
11Australia9
12Denmark9
13Netherlands9
14Japan8
15Norway8
16Sweden7
17Unspecified7
18Channel Islands7
19Austria6
20Cyprus6
21Poland5
22Israel4
23Hong Kong4
24USA4
25Canada4
26Greece2
27United Arab Emirates2
28Malta2
29Bahrain2
30European Community1
31Iceland1
32Lebanon1
33Lithuania1
34RSA1
35Saudi Arabia1
36Singapore1
37Brazil1
38Czech Republic1

Comments:
Majority of customers are from the United Kingdom, with a total count of 4306.
Germany and France follow with 91 and 87 customers respectively. This shows the primary market is in the UK.

Sales distribution by country

Description: This query calculates the total sales amount from each country.
Question: Which countries contribute the most to the sales of the ecommerce?
Query:
SELECT Country, SUM(Sales) AS CountryTotalSales FROM sales_data
GROUP BY Country
ORDER BY CountryTotalSales DESC;

g. Sales distribution by country
countrycountrytotalsales
1United Kingdom52524658.47
2Netherlands2151553.59
3EIRE1713410.95
4Germany1371543.27
5France1330652.89
6Australia995607.91
7Sweden401879.89
8Switzerland361969.25
9Japan293155.44
10Spain281012.27
11Belgium272131.88
12Norway188612.52
13Portugal176110.40
14Finland120972.15
15Denmark101083.99
16Channel Islands95932.24
17Italy78536.24
18Austria69147.26
19Singapore63480.95
20Cyprus62346.75
21Israel54943.36
22Iceland38321.90
23USA32868.65
24Unspecified32699.05
25Canada30490.91
26Hong Kong27835.50
27Poland19793.11
28Malta12717.23
29United Arab Emirates11966.26
30Greece9666.53
31Lithuania8188.08
32Czech Republic7674.44
33European Community6210.23
34Lebanon5692.32
35Brazil4652.27
36RSA4259.83
37Bahrain3226.86
38Saudi Arabia969.50

Comments:
The United Kingdom dominates in sales with a massive £52,524,658.47.
The Netherlands comes second but with a significantly lower total of £2,151,553.59.

Day with Maximum Sales

Description: This query identifies the single day with the highest total sales.
Question: On which day did the ecommerce achieve the highest sales?
Query:
SELECT Date, SUM(Sales) AS TotalSales FROM sales_data
GROUP BY Date
ORDER BY TotalSales DESC
LIMIT 1;

h. Day with Maximum Sales
datetotalsales
112/9/20191106069.97

Comments:
The highest sales in a single day were on December 9, 2019, with total sales of £1,106,069.97.
As explained before, this is the last day available in the dataset, further investigations about the transaction of this day could be executed.

Peak Sales Month

Description: This query identifies the month with the highest sales.
Question: In which month did the ecommerce achieve peak sales?
Query:
SELECT Month, SUM(Sales) AS TotalSales FROM sales_data
GROUP BY Month
ORDER BY TotalSales DESC
LIMIT 3;

i. Peak Sales Month
monthtotalsales
1117861197.12
2107237417.36
3126930364.80

Comments:

November 2019 was the peak month with sales amounting to £7,861,197.12.

Transactions with High Quantity

Description: This query lists transactions where the quantity of products is significantly above average.
Question: Which transactions had unusually high quantities of products?
Query:
WITH AvgQuantity AS (SELECT AVG(Quantity) AS AvgQty FROM sales_data)
SELECT TransactionNo, ProductNo, Quantity
FROM sales_data, AvgQuantity
WHERE Quantity > 5 * AvgQty
ORDER BY Quantity DESC
LIMIT 20;

j. Transactions with High Quantity
transactionnoproductnoquantity
15814832384380995
25414312316674215
35788418482612540
4573008840774800
5554868221974300
6544612220533906
7560599180073186
8550461211083114
9540815211083114
10573995160143000
11536830840772880
12562439848792880
13554272219772700
14543057840772592
15573003230842400
16544152180072400
17539101226932400
18543669226932400
19547037219672160
20574294219152100

Comments:
Transaction 581483 with product 23843 had an exceptionally high quantity of 80,995. We have already examinate in the EDA the possible reason.

Product Price Distribution

Description: This query categorizes products into different price ranges.
Question: How many products fall into each price range?
Query:
SELECT
CASE
WHEN Price BETWEEN 0 AND 10 THEN ‘Low’
WHEN Price BETWEEN 11 AND 50 THEN ‘Medium’
ELSE ‘High’
END AS PriceRange,
COUNT(DISTINCT ProductNo) AS NumberOfProducts
FROM sales_data
GROUP BY PriceRange
ORDER BY NumberOfProducts DESC;

k. Product Price Distribution
pricerangenumberofproducts
1Medium3510
2Low2688
3High1052

Comments:
Most products fall within the medium price range with a total of 3510 products.
This indicates the store has a diverse product pricing strategy.

Return Customer Analysis

Description: This query identifies customers who have made multiple transactions, suggesting they return to make purchases.
Question: Who are our returning customers based on the frequency of their transactions?
Query:
SELECT CustomerNo, COUNT(DISTINCT TransactionNo) AS TotalTransactions
FROM sales_data
GROUP BY CustomerNo
HAVING COUNT(DISTINCT TransactionNo) > 5
ORDER BY COUNT(DISTINCT TransactionNo) DESC
LIMIT 10;

l. Return Customer Analysis
customernototaltransactions
112748.0207
214911.0198
317841.0125
413089.098
515311.091
614606.090
712971.086
814646.073
913408.062
1016029.062

Comments:

Customer 12748 has the highest number of transactions with a total of 207, suggesting he is a loyal customer.

3. Query for Cancelled Transactions

Cancelled Transactions Count

Description: This query counts the number of transactions that were cancelled.
Question: How many transactions were cancelled?
Query:
SELECT COUNT(DISTINCT TransactionNo) AS CancelledTransactionsCount
FROM cancelled_sales_data;

a. Cancelled Transactions Count
cancelledtransactionscount
13414

Comments:
There was a total of 3414 cancelled transactions, which suggests potential areas of improvement in the sales process or product quality.

Products with Most Cancellations

Description: This query identifies the products that are most frequently cancelled.
Question: Which products are cancelled the most by customers?
Query:
SELECT productno, COUNT(DISTINCT TransactionNo) AS CancelledCount
FROM cancelled_sales_data
GROUP BY Productno
ORDER BY CancelledCount DESC
LIMIT 10;

b. Products with Most Cancellations
productnocancelledcount
122423180
22296087
32272073
42123257
52269954
62219750
72266647
885099B43
92184342
102269742

Comments:

Product 22423 was cancelled the most, with a total of 180 cancellations. This raises a flag and demands a deeper dive to understand the reason for these cancellations.

Customers with most cancellations

Description: This query identifies customers who have cancelled the most transactions.
Question: Which customers have cancelled the most transactions?
Query:
SELECT CustomerNo, COUNT(DISTINCT TransactionNo) AS CancelledTransactions
FROM cancelled_sales_data
GROUP BY CustomerNo
ORDER BY CancelledTransactions DESC
LIMIT 5;

c. Customers with most cancellations
customernocancelledtransactions
11784145
21491143
31460635
4Unknown35
51531127

Comments:
Customer 17841 had the highest number of cancellations at 45. This might indicate dissatisfaction or other issues that should be addressed.


Findings from SQL Exploration

Monthly sales showed a fluctuating pattern, with months like November having more sales.
Products such as ‘Paper Craft Little Birdie’ and ‘Medium Ceramic Top Storage Jar’ emerged as bestsellers, both in terms of quantity and revenue.
The UK dominated the customer base, with a significant number of sales.
I identified recurring customers, with some of them making more than 200 transactions, emphasizing the importance of customer loyalty.
On the downside, there was a considerable number of cancelled transactions, which offered insights into potentially lost revenues and areas of improvement.

Answers to Initial Questions

We’ve gained insights into sales trends over the months, identifying peak sales periods.
We’ve recognized top-performing products and those that get frequently cancelled.
We’ve understood the distribution of the customer base and identified loyal customers based on their transaction frequency.

E-Commerce Dashboard (PowerBI)

For the visualization of the ecommerce’s data I decided to use the most famous and widely
used software in the field: Microsoft Power BI.

The goals is to transform analytical findings into informative visuals, with the aim of a better data
interpretation and better decision making, focusing on an appealing User Interface (UI).

What to expect?
Sales Trajectory: I showed the sales trends over time, offering a panoramic view of
business performance.
Top-Performing Products: Visualization tools highlighted the products driving a good
number of sales.
Customer Segmentation: Visual tools allowed for an in-depth comparison of regular
versus occasional shoppers.
Impact of Cancellations: A targeted analysis of cancellations showed their financial
repercussions.

Disclaimer:
For a better comprehension of the data I decided to filter until 30/11/2019.
So I can analyze all the year in its entirely (01/12/2018-30/11/2019).
This could cause some difference with the SQL Query of the previous document.

Let’s dive inside the e-commerce dashboard now.

Page 1: Overview

Sales, Transactions, and Quantity Cards: A comprehensive illustration to show the cumulative metrics across these categories. The line graphs has the goal of showing the the day-by-day trends, days of peak performance and the year growth. The value indicated is the total for every category. The line graphs show the trend of the respective categories.

Products by Revenue: A list of the top 10 revenue-generating products, “Medium Ceramic Top Storage Jar” is leading the ranks with £880.346 of revenues.

Sales by Country: A map viz that, using bubble visuals (with each bubble’s dimension indicative of a country’s contribution to total sales) shows the sales by country, here the influence of UK is clear.

Page 2: Canceled Transactions

Transaction Trends: Line graph highlighting the monthly trends of canceled transactions, offering insights into potential operational challenges.

Transactions vs Canceled Transactions: Percentage of Transactions Canceled, 14,8% of every transaction is canceled.

Revenue Impact Analysis: Contrasting missed revenues with total sales provides a perspective on the financial implications of cancellations.

Page 3: Customer Insights

Customer Count Evolution: A timeline showcasing the growth of the unique customer count over the year.

Transaction History: An interactive visualization tool that enables a deep dive into the history of customer transactions, informing about buying behaviors.

Geographical Distribution: Viz that shows the percentage of the customers’ nation, with a clearly dominant presence of UK customers.

Top Customers’ Analysis: A comparison of the top revenue-generating customers with their associated canceled transactions, offering insights into customer reliability. We can observe that customer 14911 created a huge amount of missed revenues, more than 50% of the total.

Page 4: Product Insights

Top 5 Product Analysis: A deep focus into the top 5 revenue-generating products, examining metrics like price range, total quantity sold, and revenue. The associated line graphs chart the sales trend of each product (monthly-based). The price range is high and it depends on the quantity of the specific order/transaction, with discounts on big orders.

Product Impact on Revenues: A visual representation of the total product count, emphasizing the significant revenue contribution of the top 5 products, analyzed in detail before.

Cancellation Impact: Spotlight on the top 5 products with the most cancellations, again here “Medium Ceramic Top Storage Jar” is leading the ranks.


Conclusion

Important points to acknowledge after the analysis are:
– UK Dominance, But Potential Awaits: The UK is the primary market for the e-commerce platform right now. This dominance is promising, but it also suggests potential growth opportunities in other regions. Tapping into these areas could help the platform expand its reach and audience.
– Concerns with Canceled Orders: The e-commerce platform has a significant number of canceled orders. It’s crucial to identify the reasons behind these cancellations. Addressing issues, whether they are delivery-related or tied to product satisfaction, could lead to increased sales and more satisfied customers.
– Gaining Insights into Customer Preferences: The data reveals much about customer preferences and behaviors. Leveraging this information can aid the e-commerce platform in offering tailored deals, establishing loyalty programs, or refining its service. Working with these insights can enhance the overall shopping experience for the users.
– Product Performance Varies: Some products on the platform sell exceptionally well, while others stay behind. The e-commerce platform might consider further promoting the popular ones, or perhaps revisiting and enhancing the ones with slower sales.

In essence, through tools like Python, SQL, and Power BI, a comprehensive picture of the e-commerce operations has emerged.
These insights can start to build the way for refining strategies and ensuring the platform’s continued success in the always evolving online shopping realm.