Choosing an appropriate credit scoring technique

Choosing an appropriate credit scoring technique

August 1, 2013 by Erko Selde


Credit scoring as a method of credit evaluation has been used for more than 50 years. The first mathematical retail credit scoring model was already proposed around 1941 in the US. The model was based on six parameters for scoring credit card applications, such as applicant’s job and the number of years spent in the current position. In the 1960s credit scoring started gaining wider acceptance mainly thanks to the improvements in computing power and credit card market growth. Obviously due to the advances in technology, techniques and availability of data, the methods used then are not comparable performance wise to the ones used today.

Credit scoring statistical techniques

Different ways of developing credit scoring models have evolved over the years. In certain conditions most of these statistical linear and non-linear models are applicable to build an efficient and effective credit scoring system that can be effectively used for predictive purposes. Parametric techniques, such as weight of evidence measure, correlation analysis, regression analysis, discriminant analysis, probit analysis, logistic regression, linear programming and non-parametric techniques such as support vector machines, decision trees, neural networks, k-nearest-neighbour, genetic algorithms and genetic programming, are all more or less used techniques in building credit scoring models.

Choosing an appropriate technique

We have learned from our experience that some credit underwriters are still using credit scoring methods, which were used already 40 years ago. In addition even more are still using Microsoft Excel as their main modelling tool and simple correlation analysis or decision trees as their main technique. It obviously means that they are not taking full advantage of the possibilities offered today. This usually happens because their credit scoring methods are in the worst scenario developed behind the management desk. The environment has changed especially with big data, thus specialized risk analysis and data processing teams should be used as the modern modelling process is a science.

Each technique has its own strengths and weaknesses, which often vary according to the circumstances. Some of these modelling considerations are:

  • Development speed – How easy is the method to develop, learn and apply?
  • Adaptability – How easily can the method be adapted to accommodate problems that are specific to a particular development?
  • Output transparency – Is model output easy to understand and explain, and how easy is it to control the amount of complexity?

Best Technique

According to many studies various regression analysis methods are the most suitable techniques due to the combination of implementation easiness, robustness and predictive abilities. Regression analysis helps one to understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. As a result the model will show the creditor if it is probably a good or a bad loan by considering many possible input variables such as clients’ age and so on.

All non-parametric techniques such as neural networks lack transparency and they have problems with overfitting, which means that model describes random errors and noise. It is a very big problem as the predictive performance drops and it can exaggerate the results with minor fluctuations in the input data. From parametric techniques regular correlation analysis is still surprisingly widely used as mentioned before, which stands out with several problems as well. For example problems with multicollinearity as the goal is to choose predictors that are correlated with the target, but uncorrelated with each other. This is usually not possible and therefore multicollinearity might introduce unacceptable errors in the final model. In addition adding valid weights to each parameter is very hard and usually results in errors.

Multiple regression analysis on the other hand is superior due to good predictive abilities and shortcomings of other methods. Therefore, Big Data Scoring has chosen regression analysis as our technique for creating our credit scoring model, which is transparent, easy to interpret and adopt in addition to the good predictive abilities.

In the next post we are going to talk about interpretation of regression based credit scoring models.

For additional information contact us at

Present post was made for informative purposes only and it is based on the book The Credit Scoring Toolkit by R. Anderson.