@Rick_SAS, thank you for your reply! Yes, I used a zero as an estimate for reference category to calculate a predicted value. The problem is that instead of probability I want to get a separate risk score for each variable. Let me try to give an example of what I want. We construct a logistic regression on one variable, assuming that the probability of default is dependent on the client’s work experience (in months). We have six categories of clients with different distribution of bads and goods clients: BIN Total Number of Loans Number of Bad Loans Numbef of Good Loans % Bad Loans Distibution Bad (DB) Distibution Good (DG) WOE (-;12] 9640 1935 7705 20,1% 0,199 0,129 -0,432 (12;24] 9955 1840 8115 18,5% 0,189 0,136 -0,330 (24;48] 10976 1734 9242 15,8% 0,178 0,155 -0,141 (48;72] 7183 1025 6158 14,3% 0,105 0,103 -0,021 (72;84] 21865 2452 19413 11,2% 0,252 0,325 0,255 (84; + inf) 9896 757 9139 7,6% 0,078 0,153 0,677 In proc logistic output we will gain next estimates: Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Wald Pr > ChiSq Error Chi-Square Intercept 1 -2,5178 0,0415 3684,029 <,0001 Work_exp_BIN (-;12] 1 1,1389 0,0498 522,9748 <,0001 Work_exp_BIN (12;24] 1 1,0335 0,0501 425,9453 <,0001 Work_exp_BIN (24;48] 1 0,8289 0,0503 272,0801 <,0001 Work_exp_BIN (48;72] 1 0,7398 0,0553 178,9733 <,0001 Work_exp_BIN (72;84] 1 0,4355 0,0476 83,6209 <,0001 We can score dataset or add statement output p = pred_prob to get probabilities, but I calculated it by myself: 1/(1+exp(-1*(intercept+(Work_exp_BIN_estimate*Work_exp_BIN))) We know that for the reference category the value of estimate is zero, so in fact, only intercept will remain in the exponent. After calculating the probability I can convert it to the risk score, using, for i.e, 1st formula from my 1st post: Score = 33,561144 + 20/ln(2)*ln(odds) category prob Score (-;12] 0,2011857 79,829441 (12;24] 0,1847788 82,284014 (24;48] 0,1559206 87,183777 (48;72] 0,1445503 89,368574 (72;84] 0,1108291 97,033263 (84 ; +inf) 0,0746197 108,44742 The result would have completely satisfied me, but if there are more than one variable, then it becomes difficult to calculate the risk score for each variable separately (since the intersection is common to the entire model, and the probability is calculated from all factors). The second formula allows you to solve this problem, but if you use zero as a beta coefficient, then even with large values of WOE, the value of the risk score will be the average, since the left side of the equation will be equal to zero. Is it possible to get a standardized logistic regression coefficient for a single variable? Also I do not exclude the option that I incorrectly interpreted the coefficients in the formula (Credit Risk Scorecards Developing and Implementing Intelligent Credit Scoring, Naeem Siddiq, p.116), therefore, I would be very grateful if you correct me if I misinterpreted. Thank you in advance!
... View more