In this blog post, we will explore a very random yet interesting topic in the field of Data Science, especially Data Mining: Association Rule Learning. Association Rule Learning (ARL) is a popular technique used to uncover hidden patterns within large datasets. It is often used to analyze market basket data to identify which items are frequently bought together, thus enabling businesses to better understand their customers' purchasing behaviors.
We will be using the Apriori algorithm, a popular method in ARL, to identify associations between items within our dataset. We will then implement this algorithm using Python's mlxtend
library.
Let's dive in!
First, let's install the required Python libraries using pip
:
pip install mlxtend pandas
For this example, we will use a sample dataset containing a list of transactions made at a grocery store. Each transaction comprises items purchased by a customer.
import pandas as pd transactions = [ ['Milk', 'Bread', 'Eggs'], ['Milk', 'Cheese', 'Bread'], ['Milk', 'Cheese', 'Butter', 'Bread'], ['Eggs', 'Cheese', 'Butter'], ['Milk', 'Butter', 'Bread', 'Eggs'], ['Milk', 'Cheese', 'Bread', 'Eggs'], ['Bread', 'Eggs', 'Butter'] ] # Convert transactions to a boolean-valued DataFrame for ARL df = pd.DataFrame([[item in transaction for item in set().union(*transactions)] for transaction in transactions], columns=['Milk', 'Bread', 'Eggs', 'Cheese', 'Butter'])
Now, let's implement the Apriori algorithm using the mlxtend
library. We set the minimum support level to 0.5, meaning that an itemset needs to appear in at least 50% of transactions to be considered.
from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
With the frequent itemsets found, we can now generate association rules using the association_rules
function from the mlxtend.frequent_patterns
module. In this example, we set the minimum confidence level to 0.7.
from mlxtend.frequent_patterns import association_rules rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)
Finally, let's take a look at the discovered association rules and their corresponding support, confidence, and lift values:
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
This will output the following result:
antecedents consequents support confidence lift
0 (Milk) (Bread) 0.571429 0.8 1.142857
1 (Cheese) (Bread) 0.571429 1.0 1.428571
2 (Butter) (Eggs) 0.571429 0.8 1.142857
Based on the generated rules, we can infer that:
Through Association Rule Learning, we can unveil hidden patterns within our dataset that can help businesses make informed decisions. In this blog post, we used the Apriori algorithm to discover frequent itemsets and association rules from a grocery dataset. By analyzing these rules, businesses can devise strategies to optimize marketing campaigns, product recommendations, and inventory management.
Keep exploring, and happy data mining!