【案例共创】零售客户价值解码：基于深度聚类的智能分群与精准触达系统

作者：华为云开发者联盟

2025-12-22
贵州
本文字数：5227 字
阅读完需：约 17 分钟

最新案例动态，请查阅【案例共创】零售客户价值解码：基于深度聚类的智能分群与精准触达系统。小伙伴们快来领取华为开发者空间进行实操吧！

1 概述

1.1 案例介绍

随着快消品行业智能化升级，中国生鲜零售市场规模在 2023 年达到 5.2 万亿元（数据来源：易观分析），消费者需求呈现场景化、健康化趋势。传统商超面临双重困境：

价值挖掘不足：行业客户终身价值（LTV）开发率仅为 42%（麦肯锡 2024 研究），商品组合缺乏精准匹配；

运营成本高企：某知名生鲜平台单品促销成本占销售额 18%，损耗率超行业均值 2.3 倍。

在此背景下，基于消费行为的客户分群成为战略重点。通过无监督学习构建多维客户画像，已成为零售企业优化库存周转、提升交叉销售的核心杠杆。

通过聚类优化与业务场景融合，项目创造三重价值：

技术创新：应用 t-SNE 可视化高维特征分布，结合 DBSCAN 算法识别离群噪声点；

业务突破：

• 发现"有机食品忠诚客群"（占比 22%，月度复购率 78%）；

• 识别"价格敏感型家庭"（高客单低频次），设计大包装组合促销策略；

商业回报：预计半年内实现滞销品周转率提升 40%，精准营销 ROI 突破 1:6。

1.2 适用对象

零售商客户
个人开发者
高校学生

1.3 案例时间

本案例总时长预计 60 分钟。

1.4 案例流程

说明：

登录开发者空间，启动 Notebook；
在 Notebook 中编写代码运行调试。

1.5 资源总览

本案例预计花费 0 元。

2 资源与环境准备

2.1 启动 Notebook

参考“DeepSeek模型API调用及参数调试（开发者空间Notebook版）”案例的第 2.2 章节启动 Notebook。

2.2 安装依赖库

在 Notebook 的新执行框中输入如下代码并运行，安装所有依赖库。

!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install scikit-learn
!pip install seaborn
!pip install yellowbrick

复制代码

注意：如果上述方式安装失败，可以使用国内镜像和最新库名进行安装，命令行如下：

!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple numpy pandas matplotlib scikit-learn seaborn yellowbrick

复制代码

当安装完成后，系统会返回所有已成功安装的库，如下图安装第三方库安装成功，通过 pip list 进行所有第三方库的检查

pip list

复制代码

3 代码运行及结果展示

3.1 导入必要的库

**numpy (np)**：用于 n 维数组处理和数值计算的三方库。

pandas（pd）：用于数据分析、数据处理的三方库。

matplotlib：Python 中的 2D 绘图库。

scikit-learn：是一个基于 Python 的开源机器学习库，提供分类、回归、聚类、降维等算法，并集成数据预处理、模型评估等功能，广泛应用于数据分析和人工智能领域。

seaborn：是基于 Python 的 Matplotlib 的数据可视化库。

yellowbrick：是一个用于可视化机器学习模型和评估性能的 Python 库。

3.2 数据加载与预处理

数据准备：将如下链接数据集下载，并通过 notebook 上传。

https://case-aac4.obs.cn-north-4.myhuaweicloud.com/1_marketing_campaign.csv

下载到本地的数据：

将文件拖拽到 Notebook 左侧代码同级目录下，数据上传成功如下图所示：

3.3 运行代码

在 Notebook 的新执行框中输入如下代码并运行：

#Importing the Librariesimport numpy as npimport pandas as pdimport datetimeimport matplotlibimport matplotlib.pyplot as pltfrom matplotlib import colorsimport seaborn as snsfrom sklearn.preprocessing import LabelEncoderfrom sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom yellowbrick.cluster import KElbowVisualizerfrom sklearn.cluster import KMeansimport matplotlib.pyplot as plt, numpy as npfrom mpl_toolkits.mplot3d import Axes3Dfrom sklearn.cluster import AgglomerativeClusteringfrom matplotlib.colors import ListedColormapfrom sklearn import metricsimport warningsimport sysif not sys.warnoptions:    warnings.simplefilter("ignore")np.random.seed(42)#Loading the datasetdata = pd.read_csv("1_marketing_campaign.csv", sep="\t")print("Number of datapoints:", len(data))data.head()#Information on features data.info()data = data.dropna()print("The total number of data-points after removing the rows with missing values are:", len(data))data["Dt_Customer"] = pd.to_datetime(data["Dt_Customer"], dayfirst=True)dates = []for i in data["Dt_Customer"]:    i = i.date()    dates.append(i)  # Dates of the newest and oldest recorded customerprint("The newest customer's enrolment date in the records:", max(dates))print("The oldest customer's enrolment date in the records:", min(dates))#Created a feature "Customer_For"days = []d1 = max(dates) #taking it to be the newest customerfor i in dates:    delta = d1 - i    days.append(delta)data["Customer_For"] = daysdata["Customer_For"] = pd.to_numeric(data["Customer_For"], errors="coerce")print("Total categories in the feature Marital_Status:\n", data["Marital_Status"].value_counts(), "\n")print("Total categories in the feature Education:\n", data["Education"].value_counts())#Feature Engineering#Age of customer today data["Age"] = 2021-data["Year_Birth"]
#Total spendings on various itemsdata["Spent"] = data["MntWines"]+ data["MntFruits"]+ data["MntMeatProducts"]+ data["MntFishProducts"]+ data["MntSweetProducts"]+ data["MntGoldProds"]
#Deriving living situation by marital status"Alone"data["Living_With"]=data["Marital_Status"].replace({"Married":"Partner", "Together":"Partner", "Absurd":"Alone", "Widow":"Alone", "YOLO":"Alone", "Divorced":"Alone", "Single":"Alone",})
#Feature indicating total children living in the householddata["Children"]=data["Kidhome"]+data["Teenhome"]
#Feature for total members in the householdedata["Family_Size"] = data["Living_With"].replace({"Alone": 1, "Partner":2})+ data["Children"]
#Feature pertaining parenthooddata["Is_Parent"] = np.where(data.Children> 0, 1, 0)
#Segmenting education levels in three groupsdata["Education"]=data["Education"].replace({"Basic":"Undergraduate","2n Cycle":"Undergraduate", "Graduation":"Graduate", "Master":"Postgraduate", "PhD":"Postgraduate"})
#For claritydata=data.rename(columns={"MntWines": "Wines","MntFruits":"Fruits","MntMeatProducts":"Meat","MntFishProducts":"Fish","MntSweetProducts":"Sweets","MntGoldProds":"Gold"})
#Dropping some of the redundant featuresto_drop = ["Marital_Status", "Dt_Customer", "Z_CostContact", "Z_Revenue", "Year_Birth", "ID"]data = data.drop(to_drop, axis=1)data.describe()#To plot some selected features #Setting up colors prefrencessns.set(rc={"axes.facecolor":"#FFF9ED","figure.facecolor":"#FFF9ED"})pallet = ["#682F2F", "#9E726F", "#D6B2B1", "#B9C0C9", "#9F8A78", "#F3AB60"]cmap = colors.ListedColormap(["#682F2F", "#9E726F", "#D6B2B1", "#B9C0C9", "#9F8A78", "#F3AB60"])#Plotting following featuresTo_Plot = [ "Income", "Recency", "Customer_For", "Age", "Spent", "Is_Parent"]print("Reletive Plot Of Some Selected Features: A Data Subset")plt.figure()sns.pairplot(data[To_Plot], hue= "Is_Parent",palette= (["#682F2F","#F3AB60"]))#Taking hue plt.show()#Dropping the outliers by setting a cap on Age and income. data = data[(data["Age"]<90)]data = data[(data["Income"]<600000)]print("The total number of data-points after removing the outliers are:", len(data))import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns
# 选择数值列numeric_data = data.select_dtypes(include=[float, int])
# 计算相关性矩阵corrmat = numeric_data.corr()
# 绘制热力图plt.figure(figsize=(20, 20))sns.heatmap(corrmat, annot=True, cmap='coolwarm', center=0)plt.title('Correlation Matrix Heatmap')plt.show()#Get list of categorical variabless = (data.dtypes == 'object')object_cols = list(s[s].index)
print("Categorical variables in the dataset:", object_cols)#Label Encoding the object dtypes.LE=LabelEncoder()for i in object_cols:    data[i]=data[[i]].apply(LE.fit_transform)    print("All features are now numerical")#Creating a copy of datads = data.copy()# creating a subset of dataframe by dropping the features on deals accepted and promotionscols_del = ['AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1','AcceptedCmp2', 'Complain', 'Response']ds = ds.drop(cols_del, axis=1)#Scalingscaler = StandardScaler()scaler.fit(ds)scaled_ds = pd.DataFrame(scaler.transform(ds),columns= ds.columns )print("All features are now scaled")#Scaled data to be used for reducing the dimensionalityprint("Dataframe to be used for further modelling:")scaled_ds.head()#Initiating PCA to reduce dimentions aka features to 3pca = PCA(n_components=3)pca.fit(scaled_ds)PCA_ds = pd.DataFrame(pca.transform(scaled_ds), columns=(["col1","col2", "col3"]))PCA_ds.describe().T#A 3D Projection Of Data In The Reduced Dimensionx =PCA_ds["col1"]y =PCA_ds["col2"]z =PCA_ds["col3"]#To plotfig = plt.figure(figsize=(10,8))ax = fig.add_subplot(111, projection="3d")ax.scatter(x,y,z, c="maroon", marker="o" )ax.set_title("A 3D Projection Of Data In The Reduced Dimension")plt.show()# Quick examination of elbow method to find numbers of clusters to make.print('Elbow Method to determine the number of clusters to be formed:')Elbow_M = KElbowVisualizer(KMeans(), k=10)Elbow_M.fit(PCA_ds)Elbow_M.show()#Initiating the Agglomerative Clustering model AC = AgglomerativeClustering(n_clusters=4)# fit model and predict clustersyhat_AC = AC.fit_predict(PCA_ds)PCA_ds["Clusters"] = yhat_AC#Adding the Clusters feature to the orignal dataframe.data["Clusters"]= yhat_AC#Plotting the clustersfig = plt.figure(figsize=(10,8))ax = plt.subplot(111, projection='3d', label="bla")ax.scatter(x, y, z, s=40, c=PCA_ds["Clusters"], marker='o', cmap = cmap )ax.set_title("The Plot Of The Clusters")plt.show()#Plotting countplot of clusterspal = ["#682F2F","#B9C0C9", "#9F8A78","#F3AB60"]pl = sns.countplot(x=data["Clusters"], palette= pal)pl.set_title("Distribution Of The Clusters")plt.show()pl = sns.scatterplot(data = data,x=data["Spent"], y=data["Income"],hue=data["Clusters"], palette= pal)pl.set_title("Cluster's Profile Based On Income And Spending")plt.legend()plt.show()plt.figure()pl=sns.swarmplot(x=data["Clusters"], y=data["Spent"], color= "#CBEDDD", alpha=0.5 )pl=sns.boxenplot(x=data["Clusters"], y=data["Spent"], palette=pal)plt.show()#Creating a feature to get a sum of accepted promotions data["Total_Promos"] = data["AcceptedCmp1"]+ data["AcceptedCmp2"]+ data["AcceptedCmp3"]+ data["AcceptedCmp4"]+ data["AcceptedCmp5"]#Plotting count of total campaign accepted.plt.figure()pl = sns.countplot(x=data["Total_Promos"],hue=data["Clusters"], palette= pal)pl.set_title("Count Of Promotion Accepted")pl.set_xlabel("Number Of Total Accepted Promotions")plt.show()#Plotting the number of deals purchasedplt.figure()pl=sns.boxenplot(y=data["NumDealsPurchases"],x=data["Clusters"], palette= pal)pl.set_title("Number of Deals Purchased")plt.show()Personal = [ "Kidhome","Teenhome","Customer_For", "Age", "Children", "Family_Size", "Is_Parent", "Education","Living_With"]
for i in Personal:    plt.figure()    sns.jointplot(x=data[i], y=data["Spent"], hue =data["Clusters"], kind="kde", palette=pal)    plt.show()