Hi, I am wanting to output the Gini (Somer's D) statistic for a few hundred candidate models, each on a training data set and a validation data set.
Problem: I see how I can use ODS output to get the statistic from the training data set, but it appears there is no way to output the statisic for the scored data set.
This is what I am running:
ods _all_ close;;
ods trace on;
proc logistic data=&train outest=temp.outest&seg.&modelid outmodel=temp.model&seg.&modelid;
model default12mos(EVENT='1') = &vars/lackfit;
score data=&oos out=temp.oos&seg.&modelid outroc=vroc fitstat;roc;roccontrast;
output out=temp.a&seg.&modelid p=pred; roc;
ods output ROCassociation=temp.ROC1;
ods output ParameterEstimates =temp.PARM1;
run;
ods trace off;
I also tried this:
ods _all_ close;;
ods trace on;
proc logistic inmodel=temp.model&seg.&modelid;
score data=&oos out=temp.oos&seg.&modelid outroc=vroc;
run;
ods trace off;
It appears when I trace the ODS output there aren't any tables related to the scored data set. What gives? I can't visually look at the output as there are literally 500-600 models to cycle through and of course, it needs to be done ASAP. As it is, I am going to rely on the KS statistic because I can get that with ODS.
As shown in the "Details: Rank Correlation of Observed Responses and Predicted Probabilities" section of the LOGISTIC documentation, Gini=2c-1, where c is the area under the ROC curve (AUC). The FITSTAT option in the SCORE statement produces the AUC statistic. So, you just need to save the ScoreFitStat table using an ODS OUTPUT statement and the compute the Gini statistic from the AUC in that table.
There are other ways you can also score new dateset.
http://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
Calling @Rick_SAS
I'm not sure that I understand your issue, but the "Association" table that shows the concordant/discordant statistics is only applicable to the original data. The table shows various statistics (including Somers' D) that are based on the number of observations IN THE DATA that are concordant or discordant.See the doc section about observed responses and predicted probabilities. In general, scoring data sets do not even contain the response variable, so these statistics are meaningless for scoring data.
HTH
By the way, it will be easier to get your questions answered if everyone can run the same code. Try using data in the SASHELP libref when you post your questions. For example, here is a sample scoring data set and code for the sashelp.class data. I don't think you need to create 8 output data sets, but I left them in since they were part of your original example.
data scoreData;
do Age = 13, 15;
do Height = 55, 65;
do Weight = 75, 125;
if rand("bern",0.5)=0 then Sex = "F";
else Sex = "M";
output;
end;
end;
end;
run;
%let train =sashelp.class;
%let oos = scoreData;
ods trace on;
proc logistic data=&train outest=outest outmodel=model;
model sex = height weight age /lackfit;
score data=&oos out=outscore outroc=vroc fitstat;
roc;roccontrast;
output out=a p=pred;
ods output Association=Assoc1;
ods output ROCassociation=ROC1;
ods output ParameterEstimates =PARM1;
run;
As shown in the "Details: Rank Correlation of Observed Responses and Predicted Probabilities" section of the LOGISTIC documentation, Gini=2c-1, where c is the area under the ROC curve (AUC). The FITSTAT option in the SCORE statement produces the AUC statistic. So, you just need to save the ScoreFitStat table using an ODS OUTPUT statement and the compute the Gini statistic from the AUC in that table.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.