skip navigation

Saturday, October 25, 2014

[ – ] Text Size [ + ]  |  Print Page

Update Newsletter: Fall 2002 – Special Conference Issue

Highlights from the Conference on Credit Risk Modeling and Decisioning

By Shannon Kelly, Credit Card Specialist, Supervision, Regulation, and Credit Department; Mark Furletti, Industry Specialist, Payment Cards Center; and Sally Burke, Manager, Publications and Editorial Services, Public Affairs Department

Rather than following the strict order of the conference sessions, this summary presents highlights from the conference in line with the model-development and application processes:

  • Selecting appropriate data sources (Paul Calem and Robert Avery)
  • Determining the appropriateness of the sample (Dennis Ash and Steven Meester; Jonathan Crook; David Hand)
  • Determining the appropriateness of variables (Michael LaCour-Little)
  • Evaluating models from a regulator's perspective (Dennis Glennon)
  • Using scorecards in the decision process (HNC Software; Argus Information & Advisory Services; Fair, Isaac and Company; Austin Logistics; Strategic Analytics)
  • Tracing the history of scorecard development (Allen Jost)
  • Modeling small-business credit risk (Linda Allen and Grigoris Karakoulas)

Selecting Appropriate Data Sources

The first and arguably the most critical decision an analyst makes when creating a model is choosing data sources and data elements.

In a session entitled "Default Probabilities and the Econometric Environment" Paul Calem and Robert Avery, senior economists at the Board of Governors of the Federal Reserve System, examined the problems associated with constructing scorecards based solely on data provided by credit reporting companies. They pointed out that these data are limited in two ways: They lack information about the local economy and about a borrower's personal situation (e.g., layoffs, health problems, divorce). Such considerations may affect an individual's loan-repayment history or loan-repayment histories in a local area, but they may be unrelated to future patterns of repayment.

Using nationally representative account-level data, Calem and Avery examined the relationship between the economic environment and credit performance and assessment. Their model included economic conditions and trigger events by local geographic area (e.g., borrower's county and associated unemployment rates). Their results indicate that economic conditions (income level and unemployment rates) do matter in predicting consumer behavior; in particular, the effects of environmental shocks add important information about a customer's potential behavior. Including lagged economic information in the model, specifically changes in unemployment rates and housing prices, also proved predictive. Calem and Avery noted several limitations of the credit reporting system resulting from incomplete data provided by lenders. These systems would ideally include information on the timing of delinquency, collection efforts, and situational factors.

Determining the Appropriateness of the Sample

After selecting and cleaning the data, modelers need to address issues related to the sample itself, specifically whether the sample represents the population to which the scorecard will be applied. For example, the ideal sample for building a scorecard that determines the likelihood of default would include representation from the entire universe of applicants: those who were accepted because their risk of default was considered tolerable and those who otherwise would have been rejected because of intolerable risk. Since forcing lenders to extend credit to those who would otherwise be rejected for the sole purpose of improving a model's accuracy is cost prohibitive, modelers have relied on reject inference methods. These methods attempt to mitigate the accept-only bias of the sample.

Reject Inference

Dennis Ash, chief statistician at Experian, and Steven Meester, technology manager at The CIT Group, addressed "Best Practices in Reject Inferencing." They presented five methods of reject inference. These methods aim to incorporate into the modeling process how rejected applicants would have behaved, had they been approved. From least to most sophisticated, these methods are reclassification, reweighting, parceling, bureau match, and Heckman's bivariate probit method.

Ash and Meester concluded that these reject inference procedures correct for less bias than expected. Given the substantial loss of information that results when applicants are rejected because they fail to meet risk thresholds, a reliable model based on reject inference may well be impossible. In addition, they pointed out that since all of the models discussed above extrapolate from accepts, they implicitly assume similar population characteristics for both rejects and a sub-population of the accepted applicants. Ultimately, reject inference methods must be assessed on a case-by-case basis and may need to be used in combination.

Other Sample Selection Issues

Jonathan Crook, director of the Credit Research Centre, University of Edinburgh, discussed "Sample Selection Bias" and offered a general approach to reject inference. Crook introduced a bivariate probit model based on thresholds for default and accept/reject scores using distinct explanatory variables. He used a stratified sample and compared it with a holdout sample (with performance for applicants that would have been rejected). He and his co-authors used logistic regressions in constructing their model. He concluded that models without reject inference are only generally applicable. Including reject inference, however, offers only little improvement. Crook noted that the level of improvement attainable from using reject inference methods depends on the severity of the cutoff score at which applications start getting rejected.

The various issues with sample bias and general application of the sample composition lead to the consideration of experimental design methods for the selection of the best possible sample. David Hand, professor of statistics, Imperial College, London, examined several critical sample selection issues that affect credit scoring. One such issue was the "fundamental conflict" that exists in designing customer measurement models. That is, we can build a model that exclusively makes good decisions about accepting or rejecting individual borrowers or one that collects the data necessary to build a better model in the future. While the first approach employs hard accept/reject thresholds to maximize profits in the short run, it results in poor information about how to improve the model in the long run. The second approach, however, is also problematic. By focusing on data collection and accepting those who should have been rejected, the second approach enables the building of more accurate models in the future but could easily drive its user out of business before the improved models might be employed.

Hand suggested an alternative-a soft accept/reject threshold-in which accounts are accepted with a certain probability. In this way, less desirable accounts are not entirely rejected but have a lesser chance of being accepted. This alternative provides better data for improving the models. Specifically, one goal of a credit model might be to provide a quantitative measure of the benefits of making one decision over another, often for intervention with current customers. However, once a decision is made, the result of the other decision is unknown. That is, you can measure and model the outcome of each decision only individually. A soft accept/reject threshold allows every applicant to have a positive probability of being assigned to each class, allowing for all combinations of customer characteristics, decisions, and outcomes to be measured and modeled.

Determining the Appropriateness of Variables

The selection of explanatory variables from available data is an iterative part of the model-development process. Considerations for variable selection are, for the most part, purely statistical (to select the most predictive combination of variables). However, regulations limit which information can be used in a model. For example, information on race, gender, or age that would indicate inclusion in a protected class cannot be used in a scorecard. This protection extends to the concept of disparate impact, which occurs when a lending policy is applied uniformly but has a disparate impact on a protected class.

The paper "Credit Scoring and Disparate Impact," by Michael LaCour-Little, vice president, Wells Fargo Home Mortgage, and Elaine Fortowsky, director, Wells Fargo Home Mortgage, introduced a multivariate test, which incorporates an indicator of protected class into the regression model. If the indicator is a significant predictor and the relationships of other variables to the dependent variable change as a result, the model is deemed to have a disparate impact. The authors proposed a corrective procedure that introduces a protected-class indicator into the initial model and later eliminates it. The objective of this process is to exclude combinations of variables having the effect of disparate impact. While the model is limited in its current stage of development and the conclusions are admittedly preliminary, the work to date suggests a potentially new and intriguing research direction for this important policy question.

Evaluating Models from a Regulator's Perspective

Apart from making sure that the variables modelers use are appropriate, regulators are concerned about the inherent risks associated with employing models. Dennis Glennon, senior financial economist, Office of the Comptroller of the Currency, discussed credit scoring from the perspective of a regulatory agency. Regulators focus on model risk: the validity, reliability, and accuracy of the models used to measure and manage credit risk. The first step in monitoring model risk is to evaluate the soundness of the methods used to build scorecards, for example, a model's statistical validity.

Once these determinations have been made, the regulator must consider whether the scorecard is being used in a manner consistent with its design. Glennon outlined two general categories for a scorecard's purpose: classification and prediction.

A classification scorecard partitions a portfolio into groups and is evaluated by its ability to maximize the divergence between groups. It is valid for the purposes of selecting out accounts with undesirable characteristics, for example, those with a high likelihood of default. However, a classification model may perform perfectly well in its ability to rank accounts by likelihood of default, but as the population changes, the meaning of a specific scorecard's value (for example, the actual default odds) may change.

When pricing for risk and profitability, banks need to find accurate predictors of actual performance, so that a particular outcome has an accurate value of risk or profitability. However, developing accurate prediction models is very complex, often requiring much more (often unknown) information about the individual account, the economic and market environments, and competition within the industry.

In concluding, Glennon emphasized the need for model builders and users to constantly ensure that the purpose for which a model is used is consistent with its original development goals.

Using Scorecards in the Decision Process

Nana Banerjee, managing consultant at Argus Information and Advisory Services, explained the increasingly competitive nature of the credit card industry. He asserted that acquisition of new accounts has become less profitable, aggressive price competition has squeezed margins, and customer loyalty has diminished. This relatively hostile business environment, further complicated by changes in the business cycle, demands that issuers focus on coordinated account-level customer-relationship management techniques.

Banerjee and representatives from other firms that specialize in model development talked about products that can help issuers increase account profitability in the current business environment and predict account behavior.

Argus Information and Advisory Services: Lifetime Value Model

Banerjee described a lifetime value (LTV) tool that Argus Information and Advisory Services developed. The LTV framework leverages historical behavior and profitability to deliver account-level projections. The model is constructed by segmenting customers according to similar behavior and profitability characteristics. Using fuzzy vectors, the model calculates the degree to which an account "fits" into a segment. For example, an account might belong, to varying degrees, to a "revolving" segment, a "transacting" segment, and a "credit challenged" segment. Markov transition matrices are then calculated to measure the likelihood that an account will migrate to another segment, given a specific action. This information can be used to calculate the lifetime value of an account under different marketing strategy assumptions.

HNC Software Inc.: Profitability Predictor

Vijay Desai, director of marketing, HNC Software, and Terrance Barker, staff scientist, presented a different approach to forecasting profitability. HNC's Profitability Predictor relies on inputs from four models that forecast: an account's expected revenue assuming that it does not close or go delinquent; the charge-off losses that result from an account's failure to make payments; the loss of revenue that results from a sharp and lasting reduction in balance and activity; and the expected operational and funding costs.

Profitability Predictor essentially ties together these four models to arrive at a net revenue forecast adjusted for credit risk and attrition. This forecast can then be used to profile customers for various customer-management decisions.

Fair, Isaac and Company: Decision Strategy Science

Larry Rosenberger, vice president, Fair, Isaac, introduced Strategy Science, an account-management approach that can guide card issuers' decision-making. Rosenberger discussed how current decision-making processes are driven by the judgment of experts who rely on many disparate scores and models (for example, profit scores, response models, revenue scores, and risk scores). The Strategy Science product integrates all of these models and scores across the customer's life cycle. In this way, actions can be optimized and their outcomes predicted.

To accomplish this, Fair, Isaac developed a model to produce decision flows that incorporate optimal mathematical relationships between the decisions. Next, the company developed a method of optimizing decision strategies driven by profit but constrained by key metrics such as volume and losses. The tradeoffs between these metrics are also considered. The model adjusts these strategies for client preferences and intentions.

Austin Logistics: CallSelect and CallTech

Mike Howard, director of research, Austin Logistics, presented two different applications of scoring models: decision strategies for collection actions (CallSelect) and call center management for collections (CallTech).

Because of the large volumes seen in consumer portfolios today, even low delinquency rates of 1 to 3 percent can seem unmanageable from a collections standpoint. Resources need to be optimally allocated to take different actions at different levels of delinquency. Modelers develop scorecards to predict the probability of a "cure" (payment within 30 days), given a certain action (none, letter, call, both letter and call, collections agency). The factors used to predict payment probability include data from credit bureaus, accounts' call histories, information on delinquencies, and demographic data.

The scorecard is built by stepwise binary logistic regression, then rebuilt every few months to incorporate new results. To best assign actions to accounts, the scorecard maximizes the expected revenue (minus costs) from the actions, given constraints of volume for each action. Howard discussed a similar model to optimize the probability of dialing a correct number, then receiving a promise to pay, based on call, payment, and delinquency history.

Strategic Analytics: Dual-Time Dynamics

Joseph Breeden, president, chief operating officer, and chief scientist, Strategic Analytics, Inc., proposed a model that takes into account changes in economic conditions and business practices in an effort to more accurately forecast account behaviors.

Breeden introduced a "Dual-Time Dynamics" method of calibrating a score's relationship to the odds of default related to macroeconomic conditions and changes in business practices. His model breaks the portfolio into components of vintage life-cycle behavior, seasonality, management actions, and external factors (competition and economic environment). He proposed an extension of this approach that would include revenue analyses in the calibration to set a scorecard cutoff to maximize profitability, rather than minimize losses.

Tracing the History of Modeling

On the second day of the conference, Allen Jost, vice president of business development, HNC Software, reflected on developments in the industry over the past 10 years. He began by explaining the increased use of "generic scores" by both lending and non-lending institutions. Generic scores, like the FICO credit risk score or the Falcon fraud score, are based on data that are widely used or formatted in a standard way, such as data from credit bureaus or credit card transactions. While banks and credit card companies have long employed these scores, Jost pointed out that new clients, such as insurance companies, have taken an interest in them.

Jost indicated that scores developed in-house have also become more prevalent. In an effort to gain a competitive advantage, lenders have hired modeling experts to customize scorecards for specific markets and strategies. Developing in-house models, therefore, has resulted in innovation and the adoption of new technologies. Jost pointed out, however, that companies that build models in-house need to incorporate rigorous controls-just as vendors that build "generic" and custom scores are required to do. He also warned less experienced businesses that build their own scores to stay vigilant. Often, these users fail to update score cutoffs or appropriately monitor score performance.

Jost concluded by describing new and emerging technologies, including neural networks, transaction scoring models, text data inclusion methods, and multiple input scoring models.

Modeling Small-Business Credit Risk

While the conference focused on modeling consumer credit risk, the conference's two concluding speakers discussed modeling the credit risks of small businesses. As noted by Linda Allen, professor of finance, City University of New York, Grigoris Karakoulas, general manager, CIBC, and discussant Joe Mason, professor of finance, Drexel University, statistical modeling techniques in the small-business market are less developed than those in the consumer sector.

As described in the earlier presentations, significant progress has been made in leveraging credit bureau and consumer behavior data to produce sophisticated consumer risk models. Likewise, the abundance of data on large public corporations (available in SEC filings, stock prices, and so forth) has resulted in similar advances in corporate risk modeling. The area in the middle, occupied by small, privately held firms, family businesses, and entrepreneurial ventures, has received much less attention. As such, much of the lending that occurs in this market continues to be driven by judgmental techniques.

Two approaches to addressing the small-business segment were discussed at the conference. The first, detailed in Allen's presentation, is a top-down approach. She starts with large corporate models and examines how they might be applied to small companies. Alternatively, a bottom-up approach in which consumer modeling techniques, like those incorporated in the FICO score, can be adapted to small firms. Models being proposed by companies that have traditionally focused on consumer credit risk, and to some extent the model proposed by Karakoulas, take this approach.

Allen reviewed five corporate risk measurement techniques: expert systems, options theory approaches, reduced-form models, value at risk, and mortality rate models. She concluded that while they have been successfully deployed in the large corporate market, these approaches are far more difficult to apply to small firms for which stock price data or other market variables do not exist.

Karakoulas proposed a model that estimates private firm default that does not rely on the public market data required by the sophisticated corporate risk models described by Allen. Instead, he proposes a model that relies on a form of discriminant analysis. He augments this approach with an iterative learning feature that helps reduce estimation error by adding and removing select variables. Karakoulas concluded that the basic model performed better than benchmark models and that future enhancements to incorporate industry performance data should further improve performance.

Conclusion

Despite the range of issues and different approaches presented at the conference, speakers, discussants, and participants agreed that more credit risk modeling research is necessary. As decision-makers increasingly rely on models to guide key risk, acquisition, capital allocation, and marketing actions, innovation and continuous model assessment will become critical to the industry's success. Fierce competition, impending risk-based capital requirements, and a burgeoning small-business credit market will require highly sophisticated models that incorporate national and regional economic data, advanced statistical techniques, and new sources of data.