Market Basket Analysis
Affinity analysis in data mining is the search of stable groups of events that occur together in a certain subject area. It is based on the search for association rules which describe patterns of the relashionships between events.
In retail, this technique is used to perform market basket analysis, identifying stable sets of products acquired by supermarket customers in one purchase (for example, "potatoes, onions and salad", "pasta and ketchup", "beer and chips", "tea and baked goods", etc.). It allows optimizing product range and store layout to "encourage" buyers through cross-selling or up-selling.
The method is also successfully used in other areas, e.g., for studying web page visits, analysing and predicting telecommunication equipment failures, in medicine, etc.
This example demonstrates consumer basket analysis of a retail chain that sells household chemicals.
Algorithm Description
1. Data Import
The dataset contains information from 5,000 receipts. A receipt with a list of purchased items is considered a transaction, and each item in the receipt is an element of the transaction.
Name | Caption |
---|---|
id | ID |
item | Item |
2. Discovering association rules
To search for the association rules, we use the FP-growth algorithm.
The loaded transactions are fed into the Input Data Source port of the Association Rules node.
We set up the Association Rules node as follows:
- Field
ID
: Assign the usage type Transaction - Field
item
: Usage type Item - Checkbox
Exclude items with support greater than maximum
: Marked Maximum support, %
: 20- Checkbox
Exclude single sets
: Marked Minimum rule confidence, %
: 25Maximum number of consequences
: 2
Whenever you change any settings, retrain the model.
Interpretation of Results
This port contains the itemsets that are most commonly found in transactions (frequent sets).
The port contains the identified association rules and their indicators: support, confidence, and lift.
This port contains the input set transactions to which the identified rules apply.
To present the results, we use the Table visualizer which we set up for each port.
The Association rules table displays the sets of association rules and their indicators — support, confidence, and lift. This is the information that describes customer behavior. In the list obtained, we can see trivial patterns — for example, Fabric conditioner → Laundry detergent — as well as non-obvious ones (e.g., Paper towels → Air freshener).
The analysts should study each of the discovered rules and select the ones that are truly valuable.
Download and open the file in Megaladata. If necessary, you can install the free Megaladata Community Edition.