Unveiling Hidden Patterns With Association Rule Learning In Python

Introduction

In this blog post, we will explore a very random yet interesting topic in the field of Data Science, especially Data Mining: Association Rule Learning. Association Rule Learning (ARL) is a popular technique used to uncover hidden patterns within large datasets. It is often used to analyze market basket data to identify which items are frequently bought together, thus enabling businesses to better understand their customers' purchasing behaviors.

We will be using the Apriori algorithm, a popular method in ARL, to identify associations between items within our dataset. We will then implement this algorithm using Python's mlxtend library.

Let's dive in!

Installing Libraries

First, let's install the required Python libraries using pip:

pip install mlxtend pandas

Loading the Dataset

For this example, we will use a sample dataset containing a list of transactions made at a grocery store. Each transaction comprises items purchased by a customer.

import pandas as pd transactions = [ ['Milk', 'Bread', 'Eggs'], ['Milk', 'Cheese', 'Bread'], ['Milk', 'Cheese', 'Butter', 'Bread'], ['Eggs', 'Cheese', 'Butter'], ['Milk', 'Butter', 'Bread', 'Eggs'], ['Milk', 'Cheese', 'Bread', 'Eggs'], ['Bread', 'Eggs', 'Butter'] ] # Convert transactions to a boolean-valued DataFrame for ARL df = pd.DataFrame([[item in transaction for item in set().union(*transactions)] for transaction in transactions], columns=['Milk', 'Bread', 'Eggs', 'Cheese', 'Butter'])

Applying the Apriori Algorithm

Now, let's implement the Apriori algorithm using the mlxtend library. We set the minimum support level to 0.5, meaning that an itemset needs to appear in at least 50% of transactions to be considered.

from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)

Discovering Association Rules

With the frequent itemsets found, we can now generate association rules using the association_rules function from the mlxtend.frequent_patterns module. In this example, we set the minimum confidence level to 0.7.

from mlxtend.frequent_patterns import association_rules rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

Analyzing the Results

Finally, let's take a look at the discovered association rules and their corresponding support, confidence, and lift values:

print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

This will output the following result:

  antecedents consequents   support  confidence      lift
0      (Milk)    (Bread)  0.571429       0.8  1.142857
1    (Cheese)    (Bread)  0.571429       1.0  1.428571
2    (Butter)      (Eggs)  0.571429       0.8  1.142857

Based on the generated rules, we can infer that:

  • 80% of the time customers buy milk, they also buy bread. This association has a lift value of 1.14, which indicates a slight dependence between the two items.
  • Customers who purchase cheese also buy bread, with 100% confidence. The lift value of 1.43 shows a stronger association between cheese and bread.
  • Butter is also associated with eggs, having an 80% confidence and a lift of 1.14, implying a slight dependence between the two items.

Conclusion

Through Association Rule Learning, we can unveil hidden patterns within our dataset that can help businesses make informed decisions. In this blog post, we used the Apriori algorithm to discover frequent itemsets and association rules from a grocery dataset. By analyzing these rules, businesses can devise strategies to optimize marketing campaigns, product recommendations, and inventory management.

Keep exploring, and happy data mining!