Abstract:
The securities industry is widely recognized as one of the most data-intensive sectors,characterized by diverse business scenarios.However,due to stringent regulations and high entry barriers,the growth of new securities companies is sluggish.Nevertheless,competition among the industry is intensifying.In order to expand their market share,securities companies have employed various strategies to attract new customers.They have invested considerable effort in providing personalized marketing services to enhance their customer relationship management capabilities.However,a top priority for securities companies in their customer relationship management efforts is retaining existing customers and preventing potential customer churn.
This research focuses on customer churn in the securities sector.The data used in this study consists of user-level transaction data provided by a prominent domestic securities company.Initially,we conduct data inspection and categorize the raw data into asset variables and non-asset variables.After cleaning the original data and filtering out relevant variables for later analysis,we define customer churn based on the stability of the churn status,drawing from practical experience within the securities industry.We then investigate how the number of trading days,logins,and assets impact the state of customer churn,ultimately arriving at a viable and effective definition of customer churn.Having established the response variable,the subsequent crucial task is to identify meaningful impact factors from hundreds of raw variables.To address this,we propose an independent screening method based on high-dimensional features.We first divide the churn factors into asset variables and non-asset variables,each represented by 8 and 4 aspects,respectively.From these aspects,we derive a series of indicators.We also calculate the ratios or products of asset and non-asset factors to obtain 10 composite churn factors.Subsequently,we employ the univariate AUC method,considering data quality and actual business context,to screen the churn factors.As a result,8 asset-type factors,21 non-asset-type factors,and 2 compound factors are selected for modeling.
Building upon the logistic regression model framework,we propose separate daily and weekly customer churn models,both based on the screening features.The results demonstrate that both asset-type and non-asset-type variables significantly influence customer churn prediction.The daily customer churn prediction model achieves an AUC above 0.95,indicating strong prediction accuracy.Moreover,compared to the daily model,the weekly average model exhibits lower computational cost,higher prediction efficiency,and greater stability.The model also reveals underlying mechanisms behind customer churn.For example,the coefficient of the maximum value of stock fund option market capitalization is negative,indicating an inverse correlation between the maximum value of stock fund option market capitalization and customer churn.A lower maximum value of stock fund option market capitalization might suggest poor investment performance or dissatisfaction with the current market situation,thereby increasing the likelihood of customer churn.
Regarding age,the probability of customer churn follows the pattern:60 years and above>50~60 years>40~50 years>below 40 years.This suggests that older customers are more prone to churn.Particularly,customers aged between 50 and 60 years are more likely to churn,possibly due to their proximity to retirement age,leading to higher financial planning and retirement considerations.Among the age groups,customers aged 60 years and above are most susceptible to churn.This could be attributed to changes in financial planning and service requirements after retirement or other factors that make them more likely to transfer their investments or switch service providers.
Our research findings can provide strategic analysis for companies to recover customers.Based on the proposed daily customer churn model,it is possible to calculate the predicted churn probability value for each customer and sort the samples in descending order according to the prediction values.We introduced the coverage-capture rate curve to evaluate the model's prediction accuracy in practical business scenarios.Based on the modeling results,we classified customers into ten categories.In real-world business operations,companies can develop specific recovery strategies for customers with different churn risks.For instance,they can focus marketing budgets on high-risk churn customers,engaging in face-to-face visits,phone follow-ups,and direct communication with customers.By gaining targeted insights into customer status and preferences,they can implement appropriate measures such as reducing commissions or offering more convenient services,effectively minimizing the risk of customer churn.
Finally,we applied the customer churn model based on high-dimensional features proposed in this paper to the online production environment of the enterprise,which enables real-time identification of customer churn status.To achieve this,we conducted tests in collaboration with a partner company on over four million valid customers from September to December 2020.To validate the accuracy of the online model,we randomly sampled 5% of these four million valid customers as the testing sample.During the online testing,the performance was excellent,with a prediction accuracy of 39.2% and a recall rate of 93.6%.For the enterprise,having a higher recall rate is particularly crucial because a higher recall rate indicates a higher probability of capturing truly churned customers from the original sample.