BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stratozyck
Calcite | Level 5

Hi, I am wanting to output the Gini (Somer's D) statistic for a few hundred candidate models, each on a training data set and a validation data set.

Problem: I see how I can use ODS output to get the statistic from the training data set, but it appears there is no way to output the statisic for the scored data set.

 

This is what I am running:

ods _all_ close;;
ods trace on;
proc logistic data=&train outest=temp.outest&seg.&modelid outmodel=temp.model&seg.&modelid;
model default12mos(EVENT='1') = &vars/lackfit;
score data=&oos out=temp.oos&seg.&modelid outroc=vroc fitstat;roc;roccontrast;
output out=temp.a&seg.&modelid p=pred; roc;
ods output ROCassociation=temp.ROC1;
ods output ParameterEstimates =temp.PARM1;

run;
ods trace off;

 

I also tried this:

 

ods _all_ close;;
ods trace on;
proc logistic inmodel=temp.model&seg.&modelid;
score data=&oos out=temp.oos&seg.&modelid outroc=vroc;
run;
ods trace off;

 

It appears when I trace the ODS output there aren't any tables related to the scored data set. What gives? I can't visually look at the output as there are literally 500-600 models to cycle through and of course, it needs to be done ASAP. As it is, I am going to rely on the KS statistic because I can get that with ODS.

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

As shown in the "Details: Rank Correlation of Observed Responses and Predicted Probabilities" section of the LOGISTIC documentation, Gini=2c-1, where c is the area under the ROC curve (AUC). The FITSTAT option in the SCORE statement produces the AUC statistic. So, you just need to save the ScoreFitStat table using an ODS OUTPUT statement and the compute the Gini statistic from the AUC in that table.  

View solution in original post

6 REPLIES 6
Ksharp
Super User

There are other ways you can also score new dateset.

 

http://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html

 

Calling @Rick_SAS

Rick_SAS
SAS Super FREQ

I'm not sure that I understand your issue, but the "Association" table that shows the concordant/discordant statistics is only applicable to the original data. The table shows various statistics (including Somers' D) that are based on the number of observations IN THE DATA that are concordant or discordant.See the doc section about observed responses and predicted probabilities.  In general, scoring data sets do not even contain the response variable, so these statistics are meaningless for scoring data.

 

HTH

Rick_SAS
SAS Super FREQ

By the way, it will be easier to get your questions answered if everyone can run the same code. Try using data in the SASHELP libref when you post your questions. For example, here is a sample scoring data set and code for the sashelp.class data. I don't think you need to create 8 output data sets, but I left them in since they were part of your original example.

 

data scoreData;
do Age = 13, 15;
   do Height = 55, 65;
      do Weight = 75, 125;
         if rand("bern",0.5)=0 then Sex = "F";
         else Sex = "M";
         output;
      end;
   end;
end;
run;

%let train =sashelp.class;
%let oos = scoreData;

ods trace on;
proc logistic data=&train outest=outest outmodel=model;
model sex = height weight age /lackfit;
score data=&oos out=outscore outroc=vroc fitstat;
roc;roccontrast;
output out=a p=pred; 
ods output Association=Assoc1;
ods output ROCassociation=ROC1;
ods output ParameterEstimates =PARM1;
run;
stratozyck
Calcite | Level 5
Thanks - what I mean is when you turn on graphics, it does output scoring gini. I just wanted that in non graphical form.

It got resolved because we decided to go with KS. We score on a known out of sample and out of time that does have the response variable.

Thanks for the help I was under a ton of pressure to get this done yesterday (even though it was assigned this morning).
StatDave
SAS Super FREQ

As shown in the "Details: Rank Correlation of Observed Responses and Predicted Probabilities" section of the LOGISTIC documentation, Gini=2c-1, where c is the area under the ROC curve (AUC). The FITSTAT option in the SCORE statement produces the AUC statistic. So, you just need to save the ScoreFitStat table using an ODS OUTPUT statement and the compute the Gini statistic from the AUC in that table.  

stratozyck
Calcite | Level 5
Thanks this works! Glad I have you guys behind me because this project involves a lot of money and my superiors are pushing their stress down upon the lowly peons like me.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3091 views
  • 0 likes
  • 4 in conversation