7 tips for selling complex Scoring projects

Every company manifests Digital transformation. However, there is a growing gap between software complexity and Top management hard skills. This gap is much more obvious in machine learning projects like Forecasting or Credit Scoring. Given this gap and the research nature of every ML project, no wonder only 25% of Machine Learning projects end up successfully.

Train and test split

The first time I heard about this gap was at the University. My teacher was involved in a commercial forecasting project with another Data Scientist. Both come up with their forecasting model but Management hesitated to decide what model is better. So everyone gathered in one room to discuss the results and management picked the model whose author yelled louder.

Today we split data into train and test subsets. Train subset is used for model training and test subset is for model evaluation. We choose the model that demonstrates the best performance on the test subset.

Why do we build Scoring models?

After graduation, I faced reality myself. There are a lot of graphs and tables that describe Scoring model quality. However, this is not what sells the project. Even profit and loss calculations that most managers understand are too complex and assume many hypotheses that are not convincing.

1. One of my colleagues was asked: why do we waste resources on Scoring projects? The answer that sold the project was because our competitors do and we will be far behind the competition if we don’t.

2. Another colleague replied that we need to find drivers, factors, client’s profiles to manage our credit portfolio. And Scorecard is one of those tools to understand the drivers and factors.

3. One of my managers said that scoring models are too complex and asked me to describe them using as few words as possible. I said that the scoring model compares every new client with the old ones. Finds clusters of old clients that are very similar to the new client in terms of gender, income, number of closed and active credits. And calculates the probability of new credit return based on the average performance of old clients in that cluster. That was more than enough to convince the manager.

Engage managers in a shared game

4. Once the model is ready management likes to play around with the model to understand how it fits their professional experience. One of the ways to engage the management at the early stage of the project is to show them a list of factors in order of their importance based on Information value. People like to question the list or suggest their factors building trust in the project.

5. Another cool engaging method is to calculate the model Score for a well-known person. That can be one of the managers or returning customer that was rejected by a model but had a successful previous history with the company. Models help to understand what has changed in the customer’s behavior that he is not eligible for credit anymore. The table below visualizes the Score calculation and shows what factors can be manipulated to increase the final Score.

factorvaluescorepop., %
salary assignments>= 200, no data19345
< 2009455
duration in month=< 1616343
> 1612440
> 337717
credit historyat another bank16930
all paid fully12753
no credits9018
installment / income<= 0.315151
> 0.310949
savings account>= 500, unknown16529
< 50011971

This model adds more scores to the clients that already have credits at another bank. So opening a credit will increase your overall score and your chances to be approved by the model.

A/B tests, of course

6. And the most convincing factor that is hard to oppose is the A/B test. Once the model is ready and approved by the management we can split our clients into A and B groups let’s say with odd or even ID numbers in the Database. One of the groups follows the current rules and the second group is being approved or rejected by the new scoring model. Over time sum of payments in both groups shows what strategy Wins.

7. Simplified version of the above A/B test compares Gini statistics over the new customers which data was not used in the training process. The idea is similar to train and test splits. This test is less strict but who cares about statistical significance when one has to sell the scoring project with limited resources?

quality control

How to increase code quality

When I was a child, I lived near the statistic research institute. There were a lot of cardboards around the territory with printed rows of numbers. Some numbers punched with holes. Later I realized those cardboards were the first computer programs. They were used even earlier to code drawings on textiles.

The first computer I ever saw was 386 PC at school. It used five-inch disks. The first disk I bought was a three-inch disk for computer class at the University. My code and coursework fit the only disk. Once I pulled it out while the green light was on. That is when I learned to make backups.

I learned how to make backups using CVS (Concurrent Version System) at my first job. Most of the projects at that time were done by the only developer. That is why we did not use concurrent features.

At my second job, I learned about prod, dev and test envorinments. Frankly speaking, we used only the first two. Because test environment was considered a waste of resources. All tests were completed using dev only. I also learned about different access rights and roles. Developers had access to the dev environment only. Our project manager was responsible for copying codes from dev to prod. Once the new functionality was finished developer wrote a description and attached files with code. The manager checked the description and copied code. Once there were errors on the prod he reversed files to the previous version.

Today Git is the most popular system for backup, concurrent work and access rights management. Being integrated with another system Git allows one to run unit tests every time one pushes his code to the repository.

Git supports prod and dev environments that are usually called master and develop branches. Only the project manager can update the master branch. Developers can create their own branches like feature or bugfix and merge them to develop. Git allows to merge branches event if there are conflicting lines of code. E.g. one line of code was updated by different developers.

One can install Git on it’s own server but most developers prefer to use or They allow one to compare code before and after updates, leave comments and likes, creating a social net for developers to share code and code review.

Our team uses and extensively. Clients can access the code 24/7, you can test, review and leave your comments.

data warehouse

How to build reporting dashboard with Pentaho BI

Many companies nowadays use several databases for CRM, HR, and accounting. Top management receives information from different departments. Data mismatch reduces the level of confidence in the data and takes time to fix the errors.

Data Warehouse allows you to get reports in no time. It provides reports in different formats, including HTML, PDF, and Excel files. Data Warehouse contains all the company’s data that allows one to analyze the whole organization, not the separated departments. One can research who attracts the most profitable clients and why some of them are still unprofitable. Data Warehouse allows one to track profits and costs online. Data loss in the original system will not affect reporting cause data in the data warehouse will still be available.

As a rule, data are copied from the original databases to the Data Warehouse at night when the workload is at its minimum. Data Warehouse checks new data for errors, builds links between different sources. Additional calculations speed up reports that are finally sent to the management.

Our team has expertise in building Data Warehouses and Reporting with Pentaho BI. It is open-source software, and you may start using Pentaho today. You do not need to negotiate a contract or agree on a budget. Pentaho is sanctions free and as reliable as any proprietary software. As a demo, we created website using the latest version of Pentaho BI 9.1.

Pentaho also contains Pentaho DI (Data Integration) that works like a vacuum cleaner. It imports data from any source, cleans it, and saves the result into Data Warehouse. Pentaho DI imports data in any format, including text files, XML, Excel, and data from relational and OLAP databases. Data processing starts automatically without administrator.


Bad client will leave for competitors

Everyone heard about financial literacy. It starts with tracking one’s spendings, earnings, and calculating the profit. Over time we learn how to corb spontaneous spendings and make big purchases.

Companies undergo a similar transformation. They invest in their future entering new markets and opening offices in the new countries. Companies burn money to attract new customers and build internal processes. Experience comes with time as well as customers’ data: companies learn how to track spendings and earnings.

Tracking profit per client allows one to sort them in the order of profitability. Thus you will learn how many of the customers are unprofitable. One can increase profit by simply avoiding bad clients.

The cumulative plot provides an idea of max possible profit. The maximum value of a plot is your maximum possible profit, given that you avoid bad clients.

Yes, we do not know what unprofitable customers are beforehand. But we have data on the previous generations. Once your CRM system collects enough data, you can start predicting profit using data on previous generations. Sorting the clients in the order of predicted profit will give you an idea of lost profit.

Artel has developed similar models for Banking, Marketing, and Collection. Shoot us an email to discuss your case: