Hi guys, I have tried to find another topic that could help me out, but still no succes to do that. Let me start by telling about my dataset: I've an application dataset based from the real world. It is a collection based on people who have tried to lend some money, I have information about them as income, age, children, married, LTV, ... etc. almost 200 variables, my response variable is their default status. Whether they have defaulted in the first year or not. My dataset includes 40.000 observations and 220 defaults (default value=1). I have tried to do clear the dataset by missing>5% => removing the variable, missing<5% => removing the rows. Now I am down to approx. 50 variables, furthermore I divided the original dataset to a training- and test-dataset (70% training, 30% test). To investigate which variables I should work further with I do the following: Proc logistic data=TRAINING_DATA;
class CATEGORY_VAR1(PARAM=REF REF='FIRST') CATEGORY_CAR2(PARAM=ref ref='FIRST');
model default(event='1')= VAR1--VAR50/selection=stepwise;
run; This gives 6 significant variables, an c-value of: 0.701, Somers' 😧 0.42, AIC: 2340,40. I'm not very happy of the c-value, but I can live with it. My next point is to try and calculate the probability of default given these 6 variables. By using the following: PROC LOGISTIC DATA = TRAINING_DATA descending;
class CATEGORY_VAR1(PARAM=REF REF='FIRST');
MODEL default(event='1') = VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 / link = probit ctable
pprob=(0.05 to 1 by 0.05);
output out=PREDICTED_PROB predicted=PD_probit;
RUN; (also tried with link=logit). When I then test these predictions to see how many of them actually are correct hits, by the following: data CHECK;
set PREDICTED_PROB;
where PD_probit > 0.5 and default=1;
run; I got 0 hits! These indicates that my model cannot predict anything... What am I doing wrong? How should I approach it? My wish would be: Check how many the model gave me correct on, in percent (hopefully a lot), and then use the model to try it of on the test-dataset. Sorry if the post is to long, let me know if there is something I should add/remove. 🙂 Best regards.
... View more