Vending Machines

This project presents an exploratory data analysis (EDA) and an interactive dashboard idea centered around a unique dataset from five different vending machines scattered between four various strategic locations in Central New Jersey, USA. The dataset originates from vending machines owned and operated by the publisher of the public available dataset, available here.

These machines are strategically located in places with varying foot traffic and consumer demographics:
– GuttenPlans, a frozen dough specialist company.
– EB Public Library, a public library with high visitor counts.
– Brunswick Sq Mall, a shopping center.
– Earle Asphalt, a construction engineering firm.
Each location has one or two designated vending machine, equipped with an e-port device that stores every transaction’s details. The data, collected from 01/01/2022 to 31/12/2022, is then transferred during the night to an electronic data aggregator.

Through a series of steps starting from data cleaning, preprocessing, and exploratory analysis, this project will uncover the consumption patterns that define each location, guiding strategies for product optimization and enhancing customer satisfaction.
After preparing the data, the focus will be on creating a dashboard that could be used in similar real life situations and can help business owner to make informed decisions.

The code and the dashboard are available for download at the end of the page.


Exploratory Data Analysis (Python)

We will use the following Python libraries:

  • pandas: A data manipulation and analysis library. We use it for loading the dataset into a DataFrame.
  • matplotlib: A data visualization library. We use it for creating plots to visualize the data.
  • seaborn: A data visualization library based on matplotlib. We use it for creating more complex plots.

Importing and Overview of the Dataset

I will start by importing pandas library and loading the dataset.
The data is read into a pandas DataFrame, a two-dimensional tabular data structure with labeled axes, which is a common structure for statistical data.

From the “About this file” section in kaggle, is possible to have a preliminary understanding of the attributes:

  • Status : Represents if the machine data is successfully processed
  • Device ID : Unique electronic identifier ( also called as ePort) for the vending machine. A machine is allocated a unique ePrt * device
  • Location : Indicates location of the vending machine
  • Machine : User-friendly machine name
  • Product : Product vended from the machine
  • Category : Carbonated / Food / Non-carbonated / Water
  • Transaction : Unique identifier for every transaction
  • TransDate : The Date & time of transaction
  • Type : Type of transaction ( Cash / Credit )
  • RCoil : Coil # used to vend the product
  • RPrice : Price of the Product
  • RQty : Quantity sold. This is usually one but machines can be configured to sell more items in a single transaction
  • MCoil : Mapped coil # used to vend the product ( from toucan )
  • MPrice : Mapped price of the Product
  • MQty : Mapped quantity sold. This is usually one but machines can be configured to sell more items in a single transaction
  • LineTotal : Total sale per transaction
  • TransTotal : Represents total of all transactions that will show up on the Credit Card. A user could vend a drink for $3 and a snack for $1.5 making a total of $4.50
  • Prcd Date : Date when the transaction was processed by SeedLive ( an entity that is used to aggregate all transactions electronically )

Preliminary Data Exploration

Missing Values: There are some missing values in the ‘Product’ and ‘Category’ columns, and a few in ‘MPrice’. Specifically, ‘Product’ has 6 missing entries, ‘Category’ has 267 missing entries, and ‘MPrice’ has 3 missing entries.
Duplicate Entries: There are no duplicate entries in the dataset.
Consistency in Categorization: The ‘Category’ column contains four unique values: ‘Carbonated’, ‘Food’, ‘Non Carbonated’, ‘Water’, and some missing values. The ‘Type’ column contain two payment types: ‘Credit’ and ‘Cash’.
Data Types: Most columns have the appropriate data type, but ‘TransDate’ and ‘Prcd Date’ should be converted to datetime format for better analysis.
Outliers: There doesn’t seem to be any noticeable outliers in the numeric data that suggest data entry errors. The data describes transaction amounts and product prices within reasonable ranges.

Data Cleaning & Preprocessing

I decided to delete “Product” and “MPrice” missing rows.
For “Category” missing values I will list all the unique products with an empty category and manually assign them a category.

After the result seems acceptable to manually map the categories for the products.

Exploratory Data Analysis (EDA)

Now I will perform an Exploratory Data Analysis (EDA) to better understand the dataset using different visualizations. For this analysis, I will look at:
1. Sales Distribution by Category: Understanding what types of products are most popular.
2. Transaction Count by Location: Seeing which locations have the most activity.
3. Average Transaction Total by Location: To get a sense of spending patterns across different locations.
4. Sales Over Time: To identify any trends or patterns in sales activity over time. Top Selling Products: Identifying the most popular products across all locations.

The first analysis gives insight into the Sales Distribution by Category. It’s clear that the Food category generate the most sales, that can indicate higher prices for this type of category, customer preferences and product popularity.

The second analysis shows the Transaction Count by Location. This shows which locations are busiest in terms of the number of transactions, giving an idea of foot traffic and vending machine usage across different places. It is important to note that, despite having two different vending machines, the Brunswick Sq. Mall has a little number of transactions in comparison to GuttenPlans and EB Public Library with only one vending machine for location.

After observing again the categories, I decided to create a new category split, between Food and Drinks, that I think will be helpful for a better understanding of the products sold by the vending machines, and for more possibilities later on when I will work on the dashboard.

Also after the split between categories, food is still first in the sales ranking.

I will now proceed with the rest of the analysis.

As is clear from the statistical data generated before, the average transaction of the data is around $2. This line graph shows the distribution, that is expected, considering the mean.

The sales over time showed how sales fluctuated across different months. During summer months (June, July, August) there is a peak of sales, more than $2000/monthly, the rest of the year is always less than $1800/monthly. Is important to notice that the vending machines in the EB Public Library started being functional from March/April, and not in the first months of the year.

This graph shows that the most sold product is Monster Energy Original, followed by Coca Cola – Zero Sugar (also present in the 6th place in the ranking with the version Regular), third place is for Red Bull – Original. The carbonated category, despite having half the sales quantity of the food category, lead the rankings.

Dashboard Creation (PowerBI)

Now that the dataset is cleaned and that I have a better understanding of the distribution of the values inside it, I will proceed to create a dashboard to showcase a general overview of the business and details for every vending machines.

Changes to the model

First of all I removed the categories not needed for this part of the project (Status, Device ID, MCoil, MPrice, MQty, TransTotal, Prcd Date)

After, using Power Query, I changed the type of some attributes (numbers and date).

I also created a measure for having an attribute that can shows the revenue by day:

The idea for this dashboard is also to provide a map for every location, to make the dashboard more visually appealing. Using a dataset based on a real-life business for this project I could easily find the locations and the coordinates, to use them for creating the maps.
I decided so to add a table called “geo” that includes the coordinates of the various locations.
This is how the final model looks like:

UI/UX Design

I decided to have an Homepage and a single page for every vending machine, and I wanted to include in every page a picture of the vending machine.
Having no access to the real pictures of them I decided to use generative Artificial Intelligence to create the images, that is possible to see here:

I created the background and the icons using Microsoft PowerPoint:

Final Result

That’s the final result:

Filters, Bookmarks and Navigation

In every page is present a Category and Product filter, activable by clicking the icon at the top left of the page. This is the appearance after clicking on it:

I created it using bookmarks, for every page there are two different bookmarks:
One with the regular design.
Another one with the filter design, and the rest of the page clearer, plus a change in the icon in the top left.
Every time that the user clicks on the icon, will switch the visual (bookmark)

The other interactive tools present in the dashboard (obviously for the funcitionality of PowerBI every visualization is interactive and the user can filter clicking on a line or a slice of the pie chart) are:
– Page Navigators: I inserted arrows for navigating between the pages.
– Home Navigator: I inserted an “Home” button at the top right of the page to rapidly come back to the first page, the homepage.

The raw files, the code and the PBI file are available here: Google Drive folder.

Special thanks to the owner of the dataset, Aniruddh Singh.

If you want to discover dataset like this take a look at my article: Finding The Best Dataset For Data Analytics Practice And Portfolio.