BookmarkSubscribeRSS Feed
LuciferF
Calcite | Level 5

Hello everyone!

 

When i modeling a scoring card, there was a problem with converting probability into a risk score. To build a score card, I used logistic regression with param=ref in CLASS statement. The final model includes 10 variables. In proc logistic output i got estimates for intercept and for all bins of each variable except reference category. According to the book of Naim Siddiqui there are two ways to convert probability of default to risk score:

1. Risk score = Offset + Factor*ln(odds)

2.Σ((-WOEj *Bi) + a/n))*Factor+Offset/n

The second method is preferable for me, because it allows me to award the score of each category to a separate variable. The problem is that in the case of reference cell coding there is no estimates for reference category, and i can't calculate risk score for reference category. Is there a way to calculate the generalized regression coefficient for each variable in case of reference cell coding or calculate estimate for reference category.

I hope I managed to describe the problem, thank you in advance, dear colleagues.

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Maybe I am misunderstanding the question, but wouldn't you use 0 as the coefficient for the categorical variable that contains the reference category? That's the definition of the reference category: it gets a zero coefficient (estimate) and the other estimates represent the relative change as compared to the reference level.

 

Here is an example, Notice that the values of the linear predictor on the score data set the same magnitudes as the parameter estimates.

data Cars;
set Sashelp.cars(where=(type^='Hybrid' AND Origin^="Europe"));
run;

proc logistic data=Cars;
class Type(ref='Sedan') / param=ref;
model Origin(event="USA") = mpg_city Type;
store out=LogiModel;
run;

data ScoreMe;
Type = "Sedan"; mpg_city = 25; output;
Type = "Wagon"; output;
Type = "Truck"; output;
run;

proc plm restore=LogiModel noprint;
score data=ScoreMe out=Pred pred; /* linear predictor */
run;

proc print data=Pred; run;
LuciferF
Calcite | Level 5

@Rick_SAS, thank you for your reply! Yes, I used a zero as an estimate for reference category to calculate a predicted value. The problem is that instead of probability I want to get a separate risk score for each variable. Let me try to give an example of what I want.

We construct a logistic regression on one variable, assuming that the probability of default is dependent on the client’s work experience (in months).

We have six categories of clients with different distribution of bads and goods clients:

BINTotal Number of LoansNumber of Bad LoansNumbef of Good Loans% Bad LoansDistibution Bad (DB)Distibution Good (DG)WOE
(-;12]96401935770520,1%0,1990,129-0,432
(12;24]99551840811518,5%0,1890,136-0,330
(24;48]109761734924215,8%0,1780,155-0,141
(48;72]71831025615814,3%0,1050,103-0,021
(72;84]2186524521941311,2%0,2520,3250,255
(84; + inf)989675791397,6%0,0780,1530,677

 

In proc logistic output we will gain next estimates:

Analysis of Maximum Likelihood Estimates
Parameter DFEstimateStandardWaldPr > ChiSq
ErrorChi-Square
Intercept 1-2,51780,04153684,029<,0001
Work_exp_BIN(-;12]11,13890,0498522,9748<,0001
Work_exp_BIN(12;24]11,03350,0501425,9453<,0001
Work_exp_BIN(24;48]10,82890,0503272,0801<,0001
Work_exp_BIN(48;72]10,73980,0553178,9733<,0001
Work_exp_BIN(72;84]10,43550,047683,6209<,0001

 

We can score dataset or add statement output  p = pred_prob to get probabilities, but I calculated it by myself:

1/(1+exp(-1*(intercept+(Work_exp_BIN_estimate*Work_exp_BIN)))

We know that for the reference category the value of estimate is zero, so in fact, only intercept will remain in the exponent. After calculating the probability I can convert it to the risk score, using, for i.e, 1st formula from my 1st post:

Score = 33,561144 + 20/ln(2)*ln(odds)

categoryprobScore
(-;12]0,201185779,829441
(12;24]0,184778882,284014
(24;48]0,155920687,183777
(48;72]0,144550389,368574
(72;84]0,110829197,033263
(84 ; +inf)0,0746197108,44742

 

The result would have completely satisfied me, but if there are more than one variable, then it becomes difficult to calculate the risk score for each variable separately (since the intersection is common to the entire model, and the probability is calculated from all factors). The second formula allows you to solve this problem, but if you use zero as a beta coefficient, then even with large values of WOE, the value of the risk score will be the average, since the left side of the equation will be equal to zero. Is it possible to get a standardized logistic regression coefficient for a single variable? Also I do not exclude the option that I incorrectly interpreted the coefficients in the formula (Credit Risk Scorecards Developing and Implementing Intelligent Credit Scoring,
Naeem Siddiq, p.116), therefore, I would be very grateful if you correct me if I misinterpreted. Thank you in advance!

 

 

 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1448 views
  • 1 like
  • 2 in conversation