基于高维特征因子的券商客户流失预警模型研究

高天辰; 曲浩; 王菲菲; 周静

基于高维特征因子的券商客户流失预警模型研究

Customer Churn Model Based on High-Dimensional Factors—A Study on Securities Companies

摘要

摘要: 随着证券行业的竞争日趋激烈，如何留住现有客户并预防潜在的客户流失已成为该行业管理者普遍关心的重要问题之一。本文针对证券行业客户流失进行研究。首先，结合证券行业的实际业务背景，探索了证券行业客户流失的定义；其次，提出了基于高维特征因子的独立性筛选方法；最后，基于筛选的因子，分别构建了客户流失预警日模型和周模型。研究结果表明，资产类因子和非资产类因子对预测客户流失具有显著效果，复合因子的预测效果不显著。客户流失预警日模型的外样本AUC值平均可以达到0.95以上，说明模型具有良好的预测精度。客户流失预警周模型的预测效果与日模型基本一致，并且具有计算成本低、预测效率高、模型更加稳定的特点。本文的研究结果可以为企业进行客户挽回提供策略分析，划分客户群体，针对不同流失风险的客户制定不同的挽回策略。另外，本文提出的流失预警模型在企业实际环境测试中也具有良好表现。

Abstract: The securities industry is widely recognized as one of the most data-intensive sectors,characterized by diverse business scenarios.However,due to stringent regulations and high entry barriers,the growth of new securities companies is sluggish.Nevertheless,competition among the industry is intensifying.In order to expand their market share,securities companies have employed various strategies to attract new customers.They have invested considerable effort in providing personalized marketing services to enhance their customer relationship management capabilities.However,a top priority for securities companies in their customer relationship management efforts is retaining existing customers and preventing potential customer churn.

This research focuses on customer churn in the securities sector.The data used in this study consists of user-level transaction data provided by a prominent domestic securities company.Initially,we conduct data inspection and categorize the raw data into asset variables and non-asset variables.After cleaning the original data and filtering out relevant variables for later analysis,we define customer churn based on the stability of the churn status,drawing from practical experience within the securities industry.We then investigate how the number of trading days,logins,and assets impact the state of customer churn,ultimately arriving at a viable and effective definition of customer churn.Having established the response variable,the subsequent crucial task is to identify meaningful impact factors from hundreds of raw variables.To address this,we propose an independent screening method based on high-dimensional features.We first divide the churn factors into asset variables and non-asset variables,each represented by 8 and 4 aspects,respectively.From these aspects,we derive a series of indicators.We also calculate the ratios or products of asset and non-asset factors to obtain 10 composite churn factors.Subsequently,we employ the univariate AUC method,considering data quality and actual business context,to screen the churn factors.As a result,8 asset-type factors,21 non-asset-type factors,and 2 compound factors are selected for modeling.

Building upon the logistic regression model framework,we propose separate daily and weekly customer churn models,both based on the screening features.The results demonstrate that both asset-type and non-asset-type variables significantly influence customer churn prediction.The daily customer churn prediction model achieves an AUC above 0.95,indicating strong prediction accuracy.Moreover,compared to the daily model,the weekly average model exhibits lower computational cost,higher prediction efficiency,and greater stability.The model also reveals underlying mechanisms behind customer churn.For example,the coefficient of the maximum value of stock fund option market capitalization is negative,indicating an inverse correlation between the maximum value of stock fund option market capitalization and customer churn.A lower maximum value of stock fund option market capitalization might suggest poor investment performance or dissatisfaction with the current market situation,thereby increasing the likelihood of customer churn.

Regarding age,the probability of customer churn follows the pattern:60 years and above>50~60 years>40~50 years>below 40 years.This suggests that older customers are more prone to churn.Particularly,customers aged between 50 and 60 years are more likely to churn,possibly due to their proximity to retirement age,leading to higher financial planning and retirement considerations.Among the age groups,customers aged 60 years and above are most susceptible to churn.This could be attributed to changes in financial planning and service requirements after retirement or other factors that make them more likely to transfer their investments or switch service providers.

Our research findings can provide strategic analysis for companies to recover customers.Based on the proposed daily customer churn model,it is possible to calculate the predicted churn probability value for each customer and sort the samples in descending order according to the prediction values.We introduced the coverage-capture rate curve to evaluate the model's prediction accuracy in practical business scenarios.Based on the modeling results,we classified customers into ten categories.In real-world business operations,companies can develop specific recovery strategies for customers with different churn risks.For instance,they can focus marketing budgets on high-risk churn customers,engaging in face-to-face visits,phone follow-ups,and direct communication with customers.By gaining targeted insights into customer status and preferences,they can implement appropriate measures such as reducing commissions or offering more convenient services,effectively minimizing the risk of customer churn.

Finally,we applied the customer churn model based on high-dimensional features proposed in this paper to the online production environment of the enterprise,which enables real-time identification of customer churn status.To achieve this,we conducted tests in collaboration with a partner company on over four million valid customers from September to December 2020.To validate the accuracy of the online model,we randomly sampled 5% of these four million valid customers as the testing sample.During the online testing,the performance was excellent,with a prediction accuracy of 39.2% and a recall rate of 93.6%.For the enterprise,having a higher recall rate is particularly crucial because a higher recall rate indicates a higher probability of capturing truly churned customers from the original sample.

HTML全文

参考文献(0)

施引文献

资源附件(0)