Market basket analysis (MBA) is an analytical technique used to predict future purchase decisions of the customers. It studies historical buying patterns and preferences of the customer to predict what they will prefer to purchase along with the existing items in their basket (or cart). It is also known as “Affinity Analysis” or “Association Rule Mining”.
Application
There are a number of applications for Market Basket Analysis. Some of them are:
- Store Layout: Based on MBA, you can position products in your retail store together to generate revenue. E.g. A customer buying Bread, would most likely buy cheese or jam. You could place them each other closeby so that consumers notice them or recall to buy them.
- Inventory Management: It will help you predict future purchases of customers over a period of time. Using your historical sales data, you will be able to predict which item you would probably fall short will help you maintain stocks in optimal quality.
- Content Placements: It can be used by online publishers and bloggers to display content which users are most likely to read next. This will help reduce bounce rate, improve engagement and result in better performance in search results. It’s also used in recommendation engines to provide the best content to the user
Concepts
Association Rules
Market Basket Analysis is mostly done based on an algorithm named “Apriori Algorithm”. The Outcome of this analysis is called association rules. Let’s take an example to understand the concept better. Consider the following dataset:
Transaction | Item 1 | Item 2 | Item 3 |
---|---|---|---|
1 | Milk | Sugar | Tea Powder |
2 | Milk | Sugar | Tea Powder |
3 | Milk | Sugar | Tea Powder |
4 | Milk | Sugar | |
5 | Milk | Sugar |
For this dataset, we can write the following Association Rules:
- Rule 1: If Milk is purchased, Then Sugar is also purchased and vice versa
- Rule 2: If Milk and Sugar are Purchased, then Tea powder is also purchased in 60% of the transactions.
This example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Note that association rules are written in “IF-THEN” format. We can also use the term “antecedent” for IF and “Consequent” for THEN.
Support
The support showcases the probability in favor of the event under analysis. It is the fraction of transactions in the dataset that contain that product or a set of products. Higher the support, more popular is the product or product bundle.
E.g. The support of “IF Milk & Sugar THEN Tea powder” is 3/5 transactions or 60% of the total transactions.
Confidence
Confidence is the conditional probability that customer buy product A will also buy product B. It expresses the operational efficiency of the rule. Higher the confidence, stronger the rule is.
It calculated as the ratio of the probability of occurrence of the favorable event to the probability of the occurrence of the antecedent. For example, the confidence of milk, sugar and tea powder can be expressed as
- Number of transactions that include Milk & Sugar (Antecedent) and Tea Powder (Consequent) is 3
- The number of transactions that contain only Milk & Sugar (Antecedent)) is 5.
- P(Milk & Sugar AND Tea Powder)/P (Milk & Sugar) = 3/5 = 60%
Hence we can say that the association rule has a confidence of 60%.
Lift Ratio
It calculates the efficiency of the rule in finding consequences, compared to a random selection of transactions. Generally, a Lift ratio of greater than one suggests some applicability of the rule.
- A lift greater than 1 indicates that the presence of A has increased the probability that the product B will occur on this transaction.
- A lift smaller than 1 indicates that the presence of A has decreased the probability that the product B will occur on this transaction
Lift (A > B) = Confidence (A > B) / Support (B)
Confidence does not measure if the association between A and B is random or not. Whereas, Lift measures the strength of association between two items. In market basket analysis, we choose the rules with a lift of more than one because the presence of one product increases the probability of the other product(s) on the same transaction. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS.
Market Basket Analysis in R
Let’s consider the following problem statement:
A Marketer is interested in knowing what product is purchased with what product or if certain products are purchased together as a group of items which they can use to strategize on the cross-selling activities.
The dataset used in the example is called groceries.csv and can be downloaded here. First you will need to install “arules” package in R.
library(arules)
groc <- read.transactions("groceries.csv", sep=",")
If you import the dataset as a csv file, each transaction item will be broken across multiple columns. Hence, to avoid this we use read.transactions() function (part of “arules” package) that allows us to easily read our groceries files as a sparse matrix.
Here are some commands that you try out to carry out MBA:
itemFrequency(groc) #To examine the frequency of items purchased in the data.
itemFrequencyPlot(groc) #Plot the frequency of items purchased.
itemFrequencyPlot(groc, support = 0.2) #To plot the frequency of items purchased with atleast 20%
itemFrequencyPlot(groc, topN=5) #To plot the top 20 items
We use the apriori() function and provide a list of parameters, those parameters being the support level, confidence level, and minimum length of each item set. This will help us understand the transaction patterns in the dataset.
groc.apriori <- apriori(groc, parameter=list(support=0.001, confidence=0.75))
summary(groc.apriori)
The summary output provides us with summary statistics on our model’s support, confidence, and lift. We can now look at the rules for our model using the inspect() function.
inspect(grocery.rules[1:10)
To visualize the results, you can use arulesViz:
library(arulesViz)
plot(groc.apriori,method="graph",interactive=TRUE,shading=NA)
Conclusion
Market basket analysis is an unsupervised machine learning technique that can be useful for finding patterns in transactional data. It can be a very powerful tool for analyzing the purchasing patterns of consumers