Customers-Segmentation-with-RFM

Saturday. January 01, 2022 - 19 mins

Customers-Segmentation-with-RFM

png

RFM Segmentation

The RFM method was introduced by Bult and Wansbeek in 1995 and has been successfully used by marketers since.
It analyzes customers’ behavior on three parameters:
Recency: How recent is the last purchase of the customer.
Frequency: How often the customer makes a purchase.
Monetary: How much money does the customer spends.
RFM Analysis is a technique used to segment customer behavior. RFM is a data science analytics application.
It is a rule-based method, not a machine learning model.
It helps to determine marketing and sales strategies based on customers’ purchasing habits.

Advantages & Disadvantages

The advantages of RFM is that it is easy to implement and it can be used for different types of business. It helps craft better marketing campaigns and improves CRM and customer’s loyalty.
The disadvantages are that it may not apply in industries where customers are usually one time buyers. It is based on historical data and won’t give much insight about prospects.

RFM Metrics

Recency Calculation

We create the recency (R) variable from the date variable we have. What we do is to subtract the last shopping date of each customer from the specified last shopping day.

Here, the perception of size and smallness is different for the Recency score. That is, a value of 1 for Recency (last shopped 1 day ago) is the best value for us, while a value of 80 is worse than 1.

Frequency Calculation

Frequency (F) consists of the total number of purchases made by each customer.

The point to be noted here is that each unique invoice number can be multiplexed and we need to count it as singular.

Monetary Calculation

In the Monetary (M) part, we calculate the money that the customer earns for us in this time period.

In our sample dataset, there was no such variable that we could take directly, and we calculated the total amount of each purchase by multiplying the number of products with the unit prices.

Methodology

To get the RFM score of a customer, we need to first calculate the R, F and M scores on a scale from 1 (worst) to 5 (best).

calculate Recency = number of days since last purchase
calculate Freqency = number of purchases during the studied period (usually one year)
calculate Monetary = total amount of purchases made during the studied period
find quintiles for each of these dimensions
give a grade to each dimension depending in which quintiles it stands
combine R, F and M scores to get the RFM score
map RF scores to segments

For this example, I will use the Online Retail dataset.

We have a Bussines problem.

An e-commerce company segments its customers and determine marketing strategies according to segments wants.
Customer segments with common behaviors Income increase by doing marketing studies in particular thinks it will.
For example, retaining customers that are very lucrative for the company different campaigns for new customers. Campaigns are wanted.

Dataset Story

The dataset named Online Retail II is a UK-based online sales company. Store’s sales between 01/12/2009 - 09/12/2011 contains.
The product catalog of this company includes souvenirs.
The vast majority of the company’s customers are corporate customers.

RESOURCES AND THANKS

Let’s move on to our example

Necessary Import

!pip install openpyxl
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
     |████████████████████████████████| 242 kB 288 kB/s            
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv[0m

df_ = pd.read_excel("../input/online-retail-ii/online_retail_II.xlsx", sheet_name="Year 2010-2011")
df = df_.copy()

df.head()

	Invoice	StockCode	Description	Quantity	InvoiceDate	Price	Customer ID	Country
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	2010-12-01 08:26:00	2.55	17850.0	United Kingdom
1	536365	71053	WHITE METAL LANTERN	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	2010-12-01 08:26:00	2.75	17850.0	United Kingdom
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom

Task 1

Understanding and Preparing Data

Let’s examine the descriptive statistics of the dataset

df.shape

(541910, 8)

df.columns

Index(['Invoice', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'Price', 'Customer ID', 'Country'],
      dtype='object')

df.describe().T

	count	mean	std	min	25%	50%	75%	max
Quantity	541910.0	9.552234	218.080957	-80995.00	1.00	3.00	10.00	80995.0
Price	541910.0	4.611138	96.759765	-11062.06	1.25	2.08	4.13	38970.0
Customer ID	406830.0	15287.684160	1713.603074	12346.00	13953.00	15152.00	16791.00	18287.0

Are there any missing observations in the dataset? If yes, how many missing observations in which variable?

df.isnull().sum()

Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64

Let’s remove the missing observations from the data set

df.dropna(inplace=True)

What is the number of unique products?

df.Description.nunique()

How many of each product are there?

df.Description.value_counts().head(10)

WHITE HANGING HEART T-LIGHT HOLDER    2070
REGENCY CAKESTAND 3 TIER              1905
JUMBO BAG RED RETROSPOT               1662
ASSORTED COLOUR BIRD ORNAMENT         1418
PARTY BUNTING                         1416
LUNCH BAG RED RETROSPOT               1358
SET OF 3 CAKE TINS PANTRY DESIGN      1232
POSTAGE                               1197
LUNCH BAG  BLACK SKULL.               1126
PACK OF 72 RETROSPOT CAKE CASES       1080
Name: Description, dtype: int64

Let’s sort the 5 most ordered products from most to least

df.groupby("Description").agg({"Quantity": "sum"}).sort_values("Quantity", ascending=False).head()

	Quantity
Description
WORLD WAR 2 GLIDERS ASSTD DESIGNS	53215
JUMBO BAG RED RETROSPOT	45066
ASSORTED COLOUR BIRD ORNAMENT	35314
WHITE HANGING HEART T-LIGHT HOLDER	34147
PACK OF 72 RETROSPOT CAKE CASES	33409

The ‘C’ in the invoices shows the canceled transactions. Let’s remove the canceled transactions from the dataset.

df = df[~df["Invoice"].astype(str).str.contains("C", na=False)]

df.head()

	Invoice	StockCode	Description	Quantity	InvoiceDate	Price	Customer ID	Country
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	2010-12-01 08:26:00	2.55	17850.0	United Kingdom
1	536365	71053	WHITE METAL LANTERN	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	2010-12-01 08:26:00	2.75	17850.0	United Kingdom
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom

We want the Quentity and Price variables to be greater than 0

df = df[(df['Quantity'] > 0)]
df = df[(df['Price'] > 0)]

Let’s create a variable called ‘TotalPrice’ that represents the total earnings per invoice

df["TotalPrice"] = df["Quantity"] * df["Price"]

Task 2

Calculating RFM metrics

df["InvoiceDate"].max()                  # Timestamp('2011-12-09 12:50:00')   we look at the date of the last invoice transaction
                                         # Because we will determine the date when we will do our analysis.
today_date = dt.datetime(2011, 12, 11)   # We determine the date of the analysis
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda InvoiceDate: (today_date - InvoiceDate.max()).days, # recency
                                     'Invoice': lambda Invoice: Invoice.nunique(),                             # frequency
                                     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})                       # monetary
rfm.head()
rfm.columns = ['recency', 'frequency', 'monetary'] # updating our variable names
rfm = rfm[rfm["monetary"] > 0] # We filter out winners with more than 0

Task 3

Generating RFM scores

rfm["recency_score"] = pd.qcut(rfm["recency"], 5, labels=[5,4,3,2,1])
rfm["frequency_score"] = pd.qcut(rfm["frequency"].rank(method="first"),5,labels=[1,2,3,4,5,])
rfm["monetary_score"] = pd.qcut(rfm["monetary"],5, labels=[1,2,3,4,5])

rfm["RFM_SCORE"] = (rfm["recency_score"].astype(str)+rfm["frequency_score"].astype(str)) 
# We did not consider Monetary because we will do a segmentation operation. We calculate this process over recency and frequency.

Task 4

Defining RFM scores as segments

seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}
rfm["SEGMENT"] = rfm["RFM_SCORE"].replace(seg_map,regex=True)

rfm.head()

	recency	frequency	monetary	recency_score	frequency_score	monetary_score	RFM_SCORE	SEGMENT
Customer ID
12346.0	326	1	77183.60	1	1	5	11	hibernating
12347.0	3	7	4310.00	5	5	5	55	champions
12348.0	76	4	1797.24	2	4	4	24	at_Risk
12349.0	19	1	1757.55	4	1	4	41	promising
12350.0	311	1	334.40	1	1	2	11	hibernating

Task 5

Action Time

Let’s choose 3 segments that we find important. These three segments;
Both in terms of action decisions,
Both in terms of the structure of the segments (mean RFM values) let’s comment.

rfm[["SEGMENT", "recency", "frequency", "monetary"]].groupby("SEGMENT").agg(["mean", "count","max"])

	recency			frequency			monetary
	mean	count	max	mean	count	max	mean	count	max
SEGMENT
about_to_sleep	53.312500	352	72	1.161932	352	2	471.994375	352	6207.67
at_Risk	153.785835	593	374	2.876897	593	6	1084.535297	593	44534.30
cant_loose	132.968254	63	373	8.380952	63	34	2796.155873	63	10254.18
champions	6.361769	633	13	12.413902	633	209	6857.963918	633	280206.02
hibernating	217.605042	1071	374	1.101774	1071	2	488.643307	1071	77183.60
loyal_customers	33.608059	819	72	6.479853	819	63	2864.247791	819	124914.53
need_attention	52.427807	187	72	2.326203	187	3	897.627861	187	12601.83
new_customers	7.428571	42	13	1.000000	42	1	388.212857	42	3861.00
potential_loyalists	17.398760	484	33	2.010331	484	3	1041.222004	484	168472.50
promising	23.510638	94	33	1.000000	94	1	294.007979	94	1757.55

rfm["SEGMENT"].value_counts().plot(kind='barh', rot=5, fontsize=20)
plt.show()

png

MY SELECTED SEGMENTS AND ACTION SUGGESTIONS

champions Champions. They want to listen to champions league music all the time. they’re like haalaand who even set their alarms to this music on their phones.

Our best customers are in this group.
It consists of the group whose last shopping dates are the closest and who visit us most frequently.
As an average of us; They earned 6857.96392 units of money
Frequency of visits 12,41390
They last shopped 6.36177 days ago.

rfm[rfm["SEGMENT"] == "champions"].head()

	recency	frequency	monetary	recency_score	frequency_score	monetary_score	RFM_SCORE	SEGMENT
Customer ID
12347.0	3	7	4310.00	5	5	5	55	champions
12362.0	4	10	5226.23	5	5	5	55	champions
12364.0	8	4	1313.10	5	4	4	54	champions
12381.0	5	5	1845.31	5	4	4	54	champions
12417.0	4	9	3649.10	5	5	5	55	champions

rfm[rfm["SEGMENT"] == "champions"].describe().T

	count	mean	std	min	25%	50%	75%	max
recency	633.0	6.361769	3.683300	1.00	3.00	5.00	10.00	13.00
frequency	633.0	12.413902	16.451672	3.00	5.00	8.00	14.00	209.00
monetary	633.0	6857.963918	20339.763842	201.12	1451.28	2612.96	4954.84	280206.02

Action Suggestions

When we add new products to our catalog » let’s send a message saying “hello handsome/beautiful I have something new for you”
will apply a discount from us to their last purchase when they reach the specific purchase. »
If they bring us new customers, we can prepare special campaigns for them »

loyal_customers Our loyal friends. They are in the 2nd most traded group but one of the most precious to us. In an important situation, they can be our airbag.

So they seem to have certain habits we can trust.
As an average of us; They earned 2864.24779 units of money
Frequency of visits 6.47985
They last shopped 33,60806 days ago.

rfm[rfm["SEGMENT"] == "loyal_customers"].head()

	recency	frequency	monetary	recency_score	frequency_score	monetary_score	RFM_SCORE	SEGMENT
Customer ID
12352.0	37	8	2506.04	3	5	5	35	loyal_customers
12359.0	58	4	6372.58	3	4	5	34	loyal_customers
12370.0	52	4	3545.69	3	4	5	34	loyal_customers
12380.0	22	4	2724.81	4	4	5	44	loyal_customers
12388.0	16	6	2780.66	4	4	5	44	loyal_customers

rfm[rfm["SEGMENT"] == "loyal_customers"].describe().T

	count	mean	std	min	25%	50%	75%	max
recency	819.0	33.608059	15.577050	15.00	20.000	30.00	44.000	72.00
frequency	819.0	6.479853	4.545669	3.00	4.000	5.00	8.000	63.00
monetary	819.0	2864.247791	6007.061883	36.56	991.795	1740.48	3052.905	124914.53

Action Suggestions

We can define short-term discounts on the products they buy most often
If they bring us new customers, we can prepare special campaigns for them

at risk Woowo we haven’t seen these friends for a long time, let’s get their attention a little bit

As an average of us; They earned 1084.53530 units of money
Frequency of visits 2.87858
They last shopped 153.78583 days ago.

rfm[rfm["SEGMENT"] == "at_Risk"].head()

	recency	frequency	monetary	recency_score	frequency_score	monetary_score	RFM_SCORE	SEGMENT
Customer ID
12348.0	76	4	1797.24	2	4	4	24	at_Risk
12383.0	185	5	1850.56	1	4	4	14	at_Risk
12393.0	73	4	1582.60	2	4	4	24	at_Risk
12399.0	120	4	1108.65	2	4	4	24	at_Risk
12409.0	79	3	11072.67	2	3	5	23	at_Risk

rfm[rfm["SEGMENT"] == "at_Risk"].describe().T

	count	mean	std	min	25%	50%	75%	max
recency	593.0	153.785835	68.618828	73.0	96.00	139.00	195.00	374.0
frequency	593.0	2.876897	0.951540	2.0	2.00	3.00	3.00	6.0
monetary	593.0	1084.535297	2562.073355	52.0	412.78	678.25	1200.62	44534.3

Action Suggestions

“Hey “NAME” sir/ma’am, we missed you so much. Let’s send you an e-mail if you want to take a look at our special offers.
They didn’t bring us bad income. Let’s remind ourselves of them. We can inform you about existing campaigns.

Customers-Segmentation-with-RFM

Customers-Segmentation-with-RFM

RFM Segmentation

Advantages & Disadvantages

RFM Metrics

Recency Calculation

Frequency Calculation

Monetary Calculation

Methodology

We have a Bussines problem.

Dataset Story

RESOURCES AND THANKS

Let’s move on to our example

Necessary Import

Task 1

Understanding and Preparing Data

Task 2

Calculating RFM metrics

Task 3

Generating RFM scores

Task 4

Defining RFM scores as segments

Task 5

Action Time

MY SELECTED SEGMENTS AND ACTION SUGGESTIONS

If they bring us new customers, we can prepare special campaigns for them »

If they bring us new customers, we can prepare special campaigns for them