05-03-2012 03:54 AM
Dear SAS Users,
I am planning to build a model on a continuous variable (LGD). I would like to use logistic regression and build a scorecard for LGD by using weights. E.g. if LGD for one obligor is 70% I would like to create two observations for this obligor one with Y=0 with weight 30% and one with Y=1 with weight 70%. This way I hope to convert my observed LGD into binary variable with suitable weights for event/non event.
My question is I am confused whether I should use proc logistic with weight statement or proc survey logistic with same weights. From what I have read so far about survey logistic it seems more suitable when data is survey based with complex stratum and cluster structures. Please advise if anyone has come across this problem before. Many thanks!
05-03-2012 02:14 PM
I am not understanding your ultimate goal. Here's how I use PROC LOGISTIC:
I build a logistic model. For a given value of LGD, the model predicts the probability of The Event happening.
To use the model, I do not need to create a new data set with weights equal to probabilities. Instead, I use the OUTMODEL= option to save the model to a data set. Then when I want to score new data, I use the INMODEL= option and the SCORE statement. For each new observation, the model will give the probability of The Event. I might set a cutoff value such as "if the probability of The Event happening is less than 0.8, then don't take action."
05-03-2012 08:45 PM
Thanks for the response. Let me clarify a bit more. My observed dependent variable LGD is a continuous variable (values between 0% to 100% approx) and not a binary variable. Therefore the concept of event/non event is not as straightforward as in case of a binary.
Since my desired outcome variable is a % of amount that cannot be recovered and I want to use logistic regression, I try to modify the LGD variable into a binary variable. Essentially in my sample no observation is totally event 1 or event 0. Each observation is partially good (1-LGD) and partially bad (LGD). Thats why I duplicate each observation into two and apply weights to the 1 and 0. We have our own reasons to favor logistic over simple linear regression or decision tree although the dependent variable is continuous.
05-04-2012 09:56 AM
I see. Here is a series of four blog articles on ways to deal with your problem in SAS: Modeling Rates and Proportions in SAS
05-04-2012 11:11 AM
Ha! You can also try to replace the logit transformation proposed in Rick's very pertinent reference with the classical arcsin(sqrt(LGD)) variance stabilizing transformation, another classic. Compare the residual distributions from each approach to decide on the best.
05-06-2012 09:17 PM
Thanks for the response. I will explore the options suggested. However, although I am quite eager to try out newer statistical procedures, in a bank, sometimes its easier to sell the idea of logistic based scorecards to the senior management rather than newer procedures. Let me give it a shot. In the meanwhile, if anyone has any inputs to the use of weights in logistic regression vs weights in survey logistic please drop in your comments.