Every company manifests Digital transformation. However, there is a growing gap between software complexity and Top management hard skills. This gap is much more evident in machine learning projects like Forecasting or Credit Scoring. Given this gap and the research nature of every ML project, no wonder only 25% of Machine Learning projects end up successfully.
Train and test split
The first time I heard about this gap was at the University. My teacher was involved in a commercial forecasting project with another Data Scientist. Both came up with their forecasting model, but management hesitated to decide what model was better. So everyone gathered in one room to discuss the results. Managers picked the model whose author yelled louder.
Today we split data into train and test subsets. The training subset is used for model training, test subset is for model evaluation. We choose the model that demonstrates the best performance on the test subset.
Why do we build Scoring models?
After graduation, I faced reality myself. There are a lot of graphs and tables that describe Scoring model quality. However, this is not what sells the project. Even profit and loss calculations that most managers understand are too complex and assume many hypotheses that are not convincing.
1. One of my colleagues was asked why do we waste resources on Scoring projects? The answer that sold the project was because our competitors do. We will be far behind the competition if we don’t.
2. Another colleague replied that we need to find drivers, factors, and clients’ profiles to manage our credit portfolio. And Scorecard is one of those tools to understand the drivers and factors.
3. One of my managers said that scoring models are too complex and asked me to describe them using as few words as possible. I replied that the scoring model compares every new client with the old ones. Finds clusters of old clients that are very similar to the new client in terms of gender, income, and number of closed and active credits. Then calculates the probability of a credit return based on the average performance of old clients in that cluster. That was more than enough to convince the manager.
Engage managers in a shared game
4. Once the model is ready managers like to play around with the model to understand how it fits their professional experience. One of the ways to engage the management at the early stage of the project is to show them a list of factors in order of their importance based on Information value. People like to question the list or suggest their factors for building trust in the project.
5. Another great engaging method to calculate the model Score for a well-known person. That can be one of the managers or returning customer rejected by a model but who had a successful previous history with the company. Models help to understand what has changed in the customer’s behavior and that he is not eligible for credit anymore. The table below visualizes the Score calculation and shows how to manipulate factors to increase the final Score.
factor | value | score | population, % |
Salary assignments for 1 year | >= 200 | 193 | 45 |
< 200 | 94 | 55 | |
Duration in month | =< 16 | 163 | 43 |
> 16 | 124 | 40 | |
> 33 | 77 | 17 | |
Credit history | credits at another bank | 169 | 30 |
credits paid fully till now | 127 | 53 | |
no credits taken, all credits paid back, delay in paying off in the past | 90 | 18 | |
Purpose | car (used), radio/television, retraining | 160 | 38 |
other | 114 | 62 | |
Installment/income | <= 30% | 151 | 51 |
> 30% | 109 | 49 | |
Savings account | >= 500, unknown | 165 | 29 |
< 500 | 119 | 71 |
This model adds more scores to the clients that already have credits at another bank. So opening a credit will increase your score and odds of getting credit.
A/B tests, of course
6. The most powerful argument that is hard to oppose is the A/B test. We can split our clients into A and B groups. Let’s say with odd or even ID numbers in the Database. A group follows the current rules. B group is approved or rejected by the new scoring model. Over time sum of payments in both groups shows what strategy wins.
7. Simple version of the above A/B test compares Gini statistics over the new customers whose data was not used in the training process. The idea is similar to the train and test splits. This test is less strict, but who cares about statistical significance when one has to sell the scoring project with limited resources?