I'm trying to figure out what happens during each step of this macro that runs a logistic regression on 10 bootstrapped samples. How are the tables "bval1" and "bval2" different? The tables have the same number of records, but I'm not sure what part of the code makes them different. I'm hoping once I know, I can figure out what the difference is between AUC1, AUC2, and AUC3.
*Make sample dataset;
data bweight ( drop = weight visit momedlevel );
 format subjectid heavy;
 set sashelp.bweight;
 subjectid = _N_;
 if weight > 4500 then heavy = 1;
  else heavy = 0;
run;
******************************************************************;
/*  BVAL macro                                                   */
/*  Author: Mithat Gonen                                         */
/*                                                               */
/*                                                               */
/*  Performs bootstrap validation                                */
/*                                                               */
/*  INPUTS                                                       */
/*                                                               */
/*  dsn:    data set name                                        */
/*  outcome:independent variable                                 */
/*  covars: list of dependent variables separated by blanks      */
/*  B:      Number of bootstrap samples                          */
/*  sel:    Selection method for logistic regression             */
/*                                                               */
******************************************************************;
%macro bval(dsn=,outcome=,covars=,B=10);
proc sql noprint;
  select n(&outcome) into:_n from &dsn;
run;
proc surveyselect data=&dsn method=urs outhits rep=&B n=&_n out=bsamples noprint;
run;
%do i=1 %to &B;
  proc logistic data=bsamples(where=(replicate=&i)) outmodel=_mod&i noprint;
    model &outcome=&covars;
  run;
  proc printto file='junk.txt';
  proc logistic inmodel=_mod&i;
    score data=&dsn out=out1&i;
  run;
  proc logistic inmodel=_mod&i;
    score data=bsamples(where=(replicate=&i)) out=out2&i;
  run;
  proc printto;run;  
%end;
  data bval1;
    set %do j=1 %to &B;out1&j(in=in&j) %end;;
	%do j=1 %to &B; if in&j then bsamp=&j; %end;
  run;
  data bval2;
    set %do j=1 %to &B;out2&j(in=in&j) %end;;
	%do j=1 %to &B; if in&j then bsamp=&j; %end;
  run;
proc printto file='junk.txt' new;
proc logistic data=bval1;
  by bsamp;
  model &outcome=p_1;
  ods output association=assoc1;
run;
proc logistic data=bval2;
  by bsamp;
  model &outcome=p_1;
  ods output association=assoc2;
run;
proc logistic data=&dsn;
  model &outcome=&covars;
  ods output association=assoc3;
run;
proc printto;
data assoc3;
  set assoc3;
  bsamp=1;
run;
data optim;
  merge assoc1(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc1))
        assoc2(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc2))
        assoc3(where=(label2='c') keep=bsamp label2 nvalue2 rename=(nvalue2=auc3));
  by bsamp;
run;
proc sql;
  select mean(auc3) as OptimisticAUC, mean(auc2-auc1) as OptimisimCorrection, 
		 mean(auc3)-mean(auc2-auc1) as CorrectedAUC from optim;
quit;
%mend;
        
%bval(dsn=bweight,outcome=heavy,covars=black married boy momage cigsperday,B=10);
@bkq32 wrote:
I'm trying to figure out what happens during each step of this macro that runs a logistic regression on 10 bootstrapped samples. How are the tables "bval1" and "bval2" different? The tables have the same number of records, but I'm not sure what part of the code makes them different. I'm hoping once I know, I can figure out what the difference is between AUC1, AUC2, and AUC3.
May I suggest you contact the author for questions about the macro?
BVAL1 seems to be the results of the logistic regression predictions applied to all observations using the i-th regression, then all i bootstrap samples combined. BVAL2 seems to be the results of the bootstrap logistic regression predictions only on the observations in the i-th sample, and then all samples combined.
Calling @Rick_SAS
To restate what Paige said:
BVAL1 is the result of scoring the original data by using the parameter estimates from each bootstrap sample.
BVAL2 is the result of scoring each bootstrap sample by using the parameter estimates from that bootstrap sample. This is the "best possible" AUC because you are scoring the same data you used to fit the model.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
