Bharat S Raj

Introduction to Market Basket Analysis

Market basket analysis (MBA) is an analytical technique used to predict future purchase decisions of the customers. It studies historical buying patterns and preferences of the customer to predict what they will prefer to purchase along with the existing items in their basket (or cart). It is also known as “Affinity Analysis” or “Association Rule Mining”.

Application

There are a number of applications for Market Basket Analysis. Some of them are:

Concepts

Association Rules

Market Basket Analysis is mostly done based on an algorithm named “Apriori Algorithm”. The Outcome of this analysis is called association rules. Let’s take an example to understand the concept better. Consider the following dataset:

Transaction Item 1 Item 2 Item 3
1 Milk Sugar Tea Powder
2 Milk Sugar Tea Powder
3 Milk Sugar Tea Powder
4 Milk Sugar
5 Milk Sugar

For this dataset, we can write the following Association Rules:

This example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Note that association rules are written in “IF-THEN” format. We can also use the term “antecedent” for IF and “Consequent” for THEN.

Support

The support showcases the probability in favor of the event under analysis. It is the fraction of transactions in the dataset that contain that product or a set of products. Higher the support, more popular is the product or product bundle.

E.g. The support of “IF Milk & Sugar THEN Tea powder” is 3/5 transactions or 60% of the total transactions.

Confidence

Confidence is the conditional probability that customer buy product A will also buy product B. It expresses the operational efficiency of the rule. Higher the confidence, stronger the rule is.

It calculated as the ratio of the probability of occurrence of the favorable event to the probability of the occurrence of the antecedent. For example, the confidence of milk, sugar and tea powder can be expressed as

Hence we can say that the association rule has a confidence of 60%.

Lift Ratio

It calculates the efficiency of the rule in finding consequences, compared to a random selection of transactions. Generally, a Lift ratio of greater than one suggests some applicability of the rule.

Lift (A > B) = Confidence (A > B) / Support (B)

Confidence does not measure if the association between A and B is random or not. Whereas, Lift measures the strength of association between two items. In market basket analysis, we choose the rules with a lift of more than one because the presence of one product increases the probability of the other product(s) on the same transaction. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS.

Market Basket Analysis in R

Let’s consider the following problem statement:

A Marketer is interested in knowing what product is purchased with what product or if certain products are purchased together as a group of items which they can use to strategize on the cross-selling activities.

The dataset used in the example is called groceries.csv and can be downloaded here. First you will need to install “arules” package in R.

library(arules)
groc <- read.transactions("groceries.csv", sep=",")

If you import the dataset as a csv file, each transaction item will be broken across multiple columns. Hence, to avoid this we use read.transactions() function (part of “arules” package) that allows us to easily read our groceries files as a sparse matrix.

Here are some commands that you try out to carry out MBA:

itemFrequency(groc)  #To examine the frequency of items purchased in the data.
itemFrequencyPlot(groc) #Plot the  frequency of items purchased.
itemFrequencyPlot(groc, support = 0.2) #To plot the frequency of items purchased with atleast 20%
itemFrequencyPlot(groc, topN=5)  #To plot the top 20 items

We use the apriori() function and provide a list of parameters, those parameters being the support level, confidence level, and minimum length of each item set. This will help us understand the transaction patterns in the dataset.

groc.apriori <- apriori(groc, parameter=list(support=0.001, confidence=0.75))
summary(groc.apriori)

The summary output provides us with summary statistics on our model’s support, confidence, and lift. We can now look at the rules for our model using the inspect() function.

inspect(grocery.rules[1:10)

To visualize the results, you can use arulesViz:

library(arulesViz)
plot(groc.apriori,method="graph",interactive=TRUE,shading=NA)

Conclusion

Market basket analysis is an unsupervised machine learning technique that can be useful for finding patterns in transactional data. It can be a very powerful tool for analyzing the purchasing patterns of consumers