DATA Step, Macro, Functions and more

10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Reply
Contributor
Posts: 27

10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Hi all,

 

I need to combine 10-fold cross-validation and Bootstrapping in the same macro. But i have the impression that I am missing the Bootstrapping  loop. I was wondering if you would be able to advise me what is wrong on my code below please?  Is there a better way to combine 10-fold cross-validation and Bootstrapping in the same macro?

data kyphosis;
 * infile cards dlm='09'x;
  input y1	y2 d;
  cards;
13	100	0
64	437	0
12	334	0
618	285	0
104	150	0
65	136	0
1573	523	0
291	927	0
84	62	0
13	54	0
338	248	1
758	917	1
189	305	1
260	88	1
223	257	1
604	231	1
366	1106	1
1094	658	1
176	65	1
1499	147	1
69	319	1
,
run;

%macro bootstrap (bootnum); *bootnum=12;

 *******************************************************************************
BOOTSTRAPPING
*******************************************************************************;

proc surveyselect data=kyphosis NOPRINT seed=1234
     out=kyphosis1(rename=(Replicate=bootsample))
     method=urs              
     samprate=100  
     outhits            
     reps=&bootnum;       
run;

*******************************************************************************
10-FOLD CROSS VALIDATION
*******************************************************************************;

 data kyphosis1;
     set kyphosis1;
      theRandom = ranuni(0);
	  *by bootsample;
    run;

proc rank data = kyphosis1 out=kRanked groups=10;
*by bootsample;
	var theRandom;
run;

%do x = 0 %to 9;


******************************************************************************
Create Training Data 
*******************************************************************************;
data training&x;
set kRanked;
where theRandom ne &x;
*by bootsample;
run;

******************************************************************************
Generate Binary variable beta1 for Biomarker 1 
*******************************************************************************;

proc sql;
create table temp&x as
select 
    a.y1 as compare_y1,
    b.*,
    a.y1 < b.y1 as beta
from training&x as a, training&x as b;
*order by compare_y1, y1;
quit;

******************************************************************************
Generate Binary variable beta1 for Biomarker 2
*******************************************************************************;

proc sql;
create table temp2&x as
select 
    a.y2 as compare_y2,
    b.*,
    a.y2 < b.y2 as beta1
from training&x as a, training&x as b;
*order by compare_y2, y2;
quit;


******************************************************************************
Merge Temp and Temp2 
*******************************************************************************;

data want&x;
merge temp&x temp2&x;
run;

******************************************************************************
Generate Combined Binary variable beta1_beta2 and Append 
*******************************************************************************;

data final&x;
set want&x;
if beta=1 & beta1=1 then beta1_beta2=1;
if beta=1 & beta1=0 then beta1_beta2=1;
if beta=0 & beta1=1 then beta1_beta2=1;
if beta=0 & beta1=0 then beta1_beta2=0;
run;

******************************************************************************
Calculate se, spec, ect...
*******************************************************************************;

proc sort data=final&x;
         by  compare_y1 compare_y2;
         run;

		 
proc freq data = final&x order=data noprint ;
by compare_y1 compare_y2;
            tables  beta1_beta2*d / out=freq_results&x OUTPCT sparse ;
			*output out=mnocol;
run;

******************************************************************************
Create Test Data 
*******************************************************************************;

data test&x; * The test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;

******************************************************************************
 APPEND THE RESULTS OF THE SELECTED CUTOFFS IN THE TRAING SET 
*******************************************************************************;
proc append base = Base force data = freq_results&x; 
run;

%end; *end 10-FOLD CROSS VALIDATION loop;
%mend;
%bootstrap(12);

Trusted Advisor
Posts: 1,289

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Why not do a 2-fold, generate a log, and see what is missing from the log, if anything (using OPTIONS MPRINT; prior to invoking the macro)?  If you find something missing, show us what you expected in the log, vs what you got.  It'll be a long log, so you might want to mask out the parts remote from any apparent problem area. 

 

Or if the resulting data is different than what you expect, then again provide what you expected vs what you got.

 

Diagnosis benefits from observation of results, as well as inference from the program logic.

 

Help us help you.

Respected Advisor
Posts: 2,660

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

@Rick_SAShas written some good posts on this subject

 

https://blogs.sas.com/content/iml/tag/bootstrap-and-resampling

--
Paige Miller
Contributor
Posts: 27

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Posted in reply to PaigeMiller

I was missing bootstrapping loop below.The program is working now. Thanks for letting me thinkSmiley Happy

 

%do i = 1 %to &bootnum; 

%end; *end Boostraping;

Contributor
Posts: 27

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Posted in reply to PaigeMiller
Many thanks
Super User
Posts: 10,621

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Here is the code I wrote before for 10-FOLD CROSS VALIDATION Logistic Regression.

 

/****** K-Fold CV ****/

%macro k_fold_cv(k=10);
ods select none;

proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score&i;
 merge true native est;
 retain id &i ;
 optimism=native-true;
run;
%end;
data k_fold_cv;
 set score1-score&k;
run;

ods select all;
%mend;

%k_fold_cv(k=10)








/*************************************/


%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);

/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;




Contributor
Posts: 27

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

Many thanks Ksharp. I am exploring it now
Trusted Advisor
Posts: 1,289

Re: 10-FOLD CROSS VALIDATION & BOOTSTRAPPING

In the case of large datasets, where efficiency  becomes an issue,  instead of:

 

data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

consider using

 

data training test / view=training;
  set have;
  if groupid=&i then output test;
  else output training;
run;

It creates both data sets simultaneously, so there's one pass through HAVE.   In addition, since training is a view (while test is a file), the data are not processed until training is submitted to proc logistic.

Ask a Question
Discussion stats
  • 7 replies
  • 197 views
  • 2 likes
  • 4 in conversation