/* split the data randomly with 50/50 split */
data train valid;
set twoyears; /* 2 years of data combined */
if ranuni(7) <= .5 then output train; else output valid;
run;
/*compare the 2 data sets */
proc logistic data = train outest=estimates_train;
model camp_flag = rit;
run;
quit;
proc logistic data = valid outest=estimates_valid;
model camp_flag = rit;
run;
quit;
Based on what I have studied I believe this is the next step. Here is the % concordant for train and valid, respectively. Is PROC SCORE my next step, using "twoyears"? I'm not sure which portion of the output to look at to determine if I have a model that's good for prediction.
Association of Predicted Probabilities and Observed Responses
Percent Concordant
94.3
Somers' D
0.892
Percent Discordant
5.1
Gamma
0.898
Percent Tied
0.6
Tau-a
0.099
Pairs
29455
c
0.946
Association of Predicted Probabilities and Observed Responses
Percent Concordant
89.0
Somers' D
0.788
Percent Discordant
10.1
Gamma
0.795
Percent Tied
0.9
Tau-a
0.063
Pairs
23648
c
0.894
... View more