Intelligent Credit Scoring. Siddiqi Naeem

Чтение книги онлайн.

Читать онлайн книгу Intelligent Credit Scoring - Siddiqi Naeem страница 10

Intelligent Credit Scoring - Siddiqi Naeem

Скачать книгу

Model development, usually represented by the head of the model development team.

      RM: Risk management, usually the portfolio risk/policy manager or end user on the business side.

      MV: Model validation or vetting, usually those responsible for overseeing the process.

      IT: Information technology or equivalent function responsible for implementing the models.

      The following stages are for post-development work for strategy development, and are usually handled by the business risk management function.

      STAGE 6. SCORECARD IMPLEMENTATION

      ● Scoring strategy

      ● Setting cutoffs

      ● Policy rules

      ● Override policy

      STAGE 7. POST-IMPLEMENTATION

      ● Scorecard and portfolio monitoring reports

      ● Scorecard management reports

      ● Portfolio performance reports

      The preceding stages are not exhaustive – they represent the major stages where key output is produced, discussed, and signed off. The signoff process, which encourages teamwork and identifying problems early, will be discussed in the next chapter. Involvement by the Model Vetting/Validation unit is dependent on the model audit policies of each individual bank as well as expectations from regulators.

      Chapter 3

      Designing the Infrastructure for Scorecard Development

      As more banks around the world realize the value of analytics and credit scoring, we see a corresponding high level of interest in setting up analytics and modeling disciplines in-house. This is where some planning and long-term vision is needed. A lot of banks hired well-qualified modelers and bought high-powered data mining software, thinking that their staff would soon be churning out models at a regular clip. For many of them, this did not materialize. Producing models and analytics took just as long to produce as before, or took significantly longer than expected. The problem was not that their staff didn’t know how to build models, or that the model fitting was taking too long. It was the fact that the actual modeling is the easiest and sometimes fastest part of the entire data mining process. The major problems, which were not addressed, were in all the other activities before and after the modeling. Problems with accessing data, data cleansing, getting business buy-in, model validation, documentation, producing audit reports, implementation, and other operational issues made the entire process slow and difficult.

      In this chapter, we look at the most common problems organizations face when setting up infrastructure for analytics and suggest ways to reduce problems through better design.

The discussion in this chapter will be limited to the tasks involved in building, using, and monitoring scorecards. Exhibit 3.1 is a simplified example of the end-to-end tasks that would take place during scorecard development projects. These are not as exhaustive as the tasks that will be covered in the rest of the book, but serve only to illustrate points associated with creating an infrastructure to facilitate the entire process.

Exhibit 3.1 Major Tasks during Scorecard Development

      Based on the most common problems lending institutions face when building scorecards, we would suggest consideration of the following main issues when looking to design an architecture to enable analytics:

      ● One version of the truth. Two people asking the same question, or repeating the same exercise should get the same answer. One way to achieve this is by sharing and reusing, for example, data sources, data extraction logic, conditions such as filters and segmentation logic, models, parameters and variables, including logic for derived ones.

      ● Transparency and audit. Given the low level of regulatory tolerance for black box models and processes, everything from the creation of data to the analytics, deployment, and reporting should be transparent. Anyone who needs to see details on each phase of the development process should be able to do so easily. For example, how data is transformed to create aggregated and derived variables, the parameters chosen for model fitting, how variables entered the model, validation details, and other parameters should preferably be stored in graphical user interface (GUI) format for review. Although all of the above can be done through coding, auditing of code is somewhat more complex. In addition, one should also be able to produce an unbroken audit chain across all the tasks shown in Exhibit 3.1– from the point where data is created in source systems, through all the data transformations and analytics, to scoring and production of validation reports as well as regulatory capital and other calculations. Model documentation should include details on the methods used and also provide effective challenge around the choice of those methods as well as the final scorecard. That means discussion and coverage is necessary for scorecards that were tested and rejected, not just the final scorecard, and competing methods to the one used to produce the scorecard.

      ● Retention of corporate intellectual property (IP)/knowledge. Practices such as writing unique code for each project and keeping it on individual PCs makes it harder to retain IP when key staff leave. Using programming-based modeling tools makes it more difficult to retain this IP as staff leaving take their coding skills with them. Most modelers/coders also choose to rewrite code rather than sort through partial code work previously written by someone else. This results in delays, and often ends with different answers obtained for the same question. To counter this, many banks have shifted to GUI software to reduce this loss and to introduce standardization.

      ● Integration across the model development tasks. Integration across the continuum of activities shown in Exhibit 3.1, from data set creation to validation, means that the output of each phase seamlessly gets used in the next. Practices such as rewriting Extract-Transform-Load (ETL) and scoring code, as well as that for deriving and creating variables into different languages is not efficient, as it lengthens the production cycle. It also presents model risk, as recoding into a different language may alter the interpretation of the original variable or condition coded. These would include parameters and conditions for both data sets and models. An integrated infrastructure for analytics also means a lowered implementation risk, as all the components across the continuum will likely work together. This is in addition to the integration and involvement of various stakeholders/personas discussed in the previous chapter.

      ● Faster time to results. It sometimes takes months to build a model and implement it in many institutions, resulting in the use of inefficient or unstable models for longer than necessary. Efficient infrastructure design can make this process much faster based on integrated components, faster learning cycles for users, and reduction of repetition (such as recoding).

      In discussing the points to consider when designing architecture/infrastructure to enable in-house scorecard development and analytics, we will consider the major tasks associated with performing analytics in any organization.

      Data Gathering and Organization

      This critical phase involves collecting and collating data from disparate (original) data sources and organizing them. This includes

Скачать книгу