BookmarkSubscribeRSS Feed
bkq32
Quartz | Level 8

I'm trying to figure out what happens during each step of this macro that runs a logistic regression on 10 bootstrapped samples. How are the tables "bval1" and "bval2" different? The tables have the same number of records, but I'm not sure what part of the code makes them different. I'm hoping once I know, I can figure out what the difference is between AUC1, AUC2, and AUC3.

 

*Make sample dataset;
data bweight ( drop = weight visit momedlevel );
 format subjectid heavy;
 set sashelp.bweight;
 subjectid = _N_;
 if weight > 4500 then heavy = 1;
  else heavy = 0;
run;



******************************************************************;
/*  BVAL macro                                                   */
/*  Author: Mithat Gonen                                         */
/*                                                               */
/*                                                               */
/*  Performs bootstrap validation                                */
/*                                                               */
/*  INPUTS                                                       */
/*                                                               */
/*  dsn:    data set name                                        */
/*  outcome:independent variable                                 */
/*  covars: list of dependent variables separated by blanks      */
/*  B:      Number of bootstrap samples                          */
/*  sel:    Selection method for logistic regression             */
/*                                                               */
******************************************************************;
%macro bval(dsn=,outcome=,covars=,B=10);
proc sql noprint;
  select n(&outcome) into:_n from &dsn;
run;
proc surveyselect data=&dsn method=urs outhits rep=&B n=&_n out=bsamples noprint;
run;
%do i=1 %to &B;
  proc logistic data=bsamples(where=(replicate=&i)) outmodel=_mod&i noprint;
    model &outcome=&covars;
  run;
  proc printto file='junk.txt';
  proc logistic inmodel=_mod&i;
    score data=&dsn out=out1&i;
  run;
  proc logistic inmodel=_mod&i;
    score data=bsamples(where=(replicate=&i)) out=out2&i;
  run;
  proc printto;run;  
%end;
  data bval1;
    set %do j=1 %to &B;out1&j(in=in&j) %end;;
	%do j=1 %to &B; if in&j then bsamp=&j; %end;
  run;
  data bval2;
    set %do j=1 %to &B;out2&j(in=in&j) %end;;
	%do j=1 %to &B; if in&j then bsamp=&j; %end;
  run;


proc printto file='junk.txt' new;
proc logistic data=bval1;
  by bsamp;
  model &outcome=p_1;
  ods output association=assoc1;
run;

proc logistic data=bval2;
  by bsamp;
  model &outcome=p_1;
  ods output association=assoc2;
run;
proc logistic data=&dsn;
  model &outcome=&covars;
  ods output association=assoc3;
run;
proc printto;

data assoc3;
  set assoc3;
  bsamp=1;
run;

data optim;
  merge assoc1(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc1))
        assoc2(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc2))
        assoc3(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc3));
  by bsamp;
run;

proc sql;
  select mean(auc3) as OptimisticAUC, mean(auc2-auc1) as OptimisimCorrection, 
		 mean(auc3)-mean(auc2-auc1) as CorrectedAUC from optim;
quit;
%mend;
        
%bval(dsn=bweight,outcome=heavy,covars=black married boy momage cigsperday,B=10);
4 REPLIES 4
PaigeMiller
Diamond | Level 26

@bkq32 wrote:

I'm trying to figure out what happens during each step of this macro that runs a logistic regression on 10 bootstrapped samples. How are the tables "bval1" and "bval2" different? The tables have the same number of records, but I'm not sure what part of the code makes them different. I'm hoping once I know, I can figure out what the difference is between AUC1, AUC2, and AUC3.


May I suggest you contact the author for questions about the macro?

 

BVAL1 seems to be the results of the logistic regression predictions applied to all observations using the i-th regression, then all i bootstrap samples combined. BVAL2 seems to be the results of the bootstrap logistic regression predictions only on the observations in the i-th sample, and then all samples combined.

--
Paige Miller
Rick_SAS
SAS Super FREQ

To restate what Paige said:

BVAL1 is the result of scoring the original data by using the parameter estimates from each bootstrap sample.

BVAL2 is the result of scoring each bootstrap sample by using the parameter estimates from that bootstrap sample. This is the "best possible" AUC because you are scoring the same data you used to fit the model.

 

 

bkq32
Quartz | Level 8
Thank you, everyone - that makes sense. Do you know why the OptimismCorrection is needed? Why not just report mean(AUC1)?

Also, if I'm modeling the probability of the event, should I modify the macro such that every PROC LOGISTIC DATA= statement has the descending option?

I can also contact the author like Paige suggested if that's easier.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1576 views
  • 0 likes
  • 4 in conversation