Every company manifests Digital transformation. However, there is a growing gap between software complexity and Top management hard skills. This gap is much more obvious in machine learning projects like Forecasting or Credit Scoring. Given this gap and the research nature of every ML project, no wonder only 25% of Machine Learning projects end up successfully.
Train and test split
The first time I heard about this gap was at the University. My teacher was involved in a commercial forecasting project with another Data Scientist. Both come up with their forecasting model but Management hesitated to decide what model is better. So everyone gathered in one room to discuss the results and management picked the model whose author yelled louder.
Today we split data into train and test subsets. Train subset is used for model training and test subset is for model evaluation. We choose the model that demonstrates the best performance on the test subset.
Why do we build Scoring models?
After graduation, I faced reality myself. There are a lot of graphs and tables that describe Scoring model quality. However, this is not what sells the project. Even profit and loss calculations that most managers understand are too complex and assume many hypotheses that are not convincing.
1. One of my colleagues was asked: why do we waste resources on Scoring projects? The answer that sold the project was because our competitors do and we will be far behind the competition if we don’t.
2. Another colleague replied that we need to find drivers, factors, client’s profiles to manage our credit portfolio. And Scorecard is one of those tools to understand the drivers and factors.
3. One of my managers said that scoring models are too complex and asked me to describe them using as few words as possible. I said that the scoring model compares every new client with the old ones. Finds clusters of old clients that are very similar to the new client in terms of gender, income, number of closed and active credits. And calculates the probability of new credit return based on the average performance of old clients in that cluster. That was more than enough to convince the manager.
Engage managers in a shared game
4. Once the model is ready management likes to play around with the model to understand how it fits their professional experience. One of the ways to engage the management at the early stage of the project is to show them a list of factors in order of their importance based on Information value. People like to question the list or suggest their factors building trust in the project.
5. Another cool engaging method is to calculate the model Score for a well-known person. That can be one of the managers or returning customer that was rejected by a model but had a successful previous history with the company. Models help to understand what has changed in the customer’s behavior that he is not eligible for credit anymore. The table below visualizes the Score calculation and shows what factors can be manipulated to increase the final Score.
|salary assignments||>= 200, no data||193||45|
|duration in month||=< 16||163||43|
|credit history||at another bank||169||30|
|all paid fully||127||53|
|installment / income||<= 0.3||151||51|
|savings account||>= 500, unknown||165||29|
This model adds more scores to the clients that already have credits at another bank. So opening a credit will increase your overall score and your chances to be approved by the model.
A/B tests, of course
6. And the most convincing factor that is hard to oppose is the A/B test. Once the model is ready and approved by the management we can split our clients into A and B groups let’s say with odd or even ID numbers in the Database. One of the groups follows the current rules and the second group is being approved or rejected by the new scoring model. Over time sum of payments in both groups shows what strategy Wins.
7. Simplified version of the above A/B test compares Gini statistics over the new customers which data was not used in the training process. The idea is similar to train and test splits. This test is less strict but who cares about statistical significance when one has to sell the scoring project with limited resources?