BookmarkSubscribeRSS Feed
jeka1212
Obsidian | Level 7

Hi all,

 

I need to combine 10-fold cross-validation and Bootstrapping in the same macro. But i have the impression that I am missing the Bootstrapping  loop. I was wondering if you would be able to advise me what is wrong on my code below please?  Is there a better way to combine 10-fold cross-validation and Bootstrapping in the same macro?

data kyphosis;
 * infile cards dlm='09'x;
  input y1	y2 d;
  cards;
13	100	0
64	437	0
12	334	0
618	285	0
104	150	0
65	136	0
1573	523	0
291	927	0
84	62	0
13	54	0
338	248	1
758	917	1
189	305	1
260	88	1
223	257	1
604	231	1
366	1106	1
1094	658	1
176	65	1
1499	147	1
69	319	1
,
run;

%macro bootstrap (bootnum); *bootnum=12;

 *******************************************************************************
BOOTSTRAPPING
*******************************************************************************;

proc surveyselect data=kyphosis NOPRINT seed=1234
     out=kyphosis1(rename=(Replicate=bootsample))
     method=urs              
     samprate=100  
     outhits            
     reps=&bootnum;       
run;

*******************************************************************************
10-FOLD CROSS VALIDATION
*******************************************************************************;

 data kyphosis1;
     set kyphosis1;
      theRandom = ranuni(0);
	  *by bootsample;
    run;

proc rank data = kyphosis1 out=kRanked groups=10;
*by bootsample;
	var theRandom;
run;

%do x = 0 %to 9;


******************************************************************************
Create Training Data 
*******************************************************************************;
data training&x;
set kRanked;
where theRandom ne &x;
*by bootsample;
run;

******************************************************************************
Generate Binary variable beta1 for Biomarker 1 
*******************************************************************************;

proc sql;
create table temp&x as
select 
    a.y1 as compare_y1,
    b.*,
    a.y1 < b.y1 as beta
from training&x as a, training&x as b;
*order by compare_y1, y1;
quit;

******************************************************************************
Generate Binary variable beta1 for Biomarker 2
*******************************************************************************;

proc sql;
create table temp2&x as
select 
    a.y2 as compare_y2,
    b.*,
    a.y2 < b.y2 as beta1
from training&x as a, training&x as b;
*order by compare_y2, y2;
quit;


******************************************************************************
Merge Temp and Temp2 
*******************************************************************************;

data want&x;
merge temp&x temp2&x;
run;

******************************************************************************
Generate Combined Binary variable beta1_beta2 and Append 
*******************************************************************************;

data final&x;
set want&x;
if beta=1 & beta1=1 then beta1_beta2=1;
if beta=1 & beta1=0 then beta1_beta2=1;
if beta=0 & beta1=1 then beta1_beta2=1;
if beta=0 & beta1=0 then beta1_beta2=0;
run;

******************************************************************************
Calculate se, spec, ect...
*******************************************************************************;

proc sort data=final&x;
         by  compare_y1 compare_y2;
         run;

		 
proc freq data = final&x order=data noprint ;
by compare_y1 compare_y2;
            tables  beta1_beta2*d / out=freq_results&x OUTPCT sparse ;
			*output out=mnocol;
run;

******************************************************************************
Create Test Data 
*******************************************************************************;

data test&x; * The test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;

******************************************************************************
 APPEND THE RESULTS OF THE SELECTED CUTOFFS IN THE TRAING SET 
*******************************************************************************;
proc append base = Base force data = freq_results&x; 
run;

%end; *end 10-FOLD CROSS VALIDATION loop;
%mend;
%bootstrap(12);

7 REPLIES 7
mkeintz
PROC Star

Why not do a 2-fold, generate a log, and see what is missing from the log, if anything (using OPTIONS MPRINT; prior to invoking the macro)?  If you find something missing, show us what you expected in the log, vs what you got.  It'll be a long log, so you might want to mask out the parts remote from any apparent problem area. 

 

Or if the resulting data is different than what you expect, then again provide what you expected vs what you got.

 

Diagnosis benefits from observation of results, as well as inference from the program logic.

 

Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

@Rick_SAShas written some good posts on this subject

 

https://blogs.sas.com/content/iml/tag/bootstrap-and-resampling

--
Paige Miller
jeka1212
Obsidian | Level 7

I was missing bootstrapping loop below.The program is working now. Thanks for letting me think:)

 

%do i = 1 %to &bootnum; 

%end; *end Boostraping;

jeka1212
Obsidian | Level 7
Many thanks
Ksharp
Super User

Here is the code I wrote before for 10-FOLD CROSS VALIDATION Logistic Regression.

 

/****** K-Fold CV ****/

%macro k_fold_cv(k=10);
ods select none;

proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score&i;
 merge true native est;
 retain id &i ;
 optimism=native-true;
run;
%end;
data k_fold_cv;
 set score1-score&k;
run;

ods select all;
%mend;

%k_fold_cv(k=10)








/*************************************/


%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);

/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;




jeka1212
Obsidian | Level 7
Many thanks Ksharp. I am exploring it now
mkeintz
PROC Star

In the case of large datasets, where efficiency  becomes an issue,  instead of:

 

data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

consider using

 

data training test / view=training;
  set have;
  if groupid=&i then output test;
  else output training;
run;

It creates both data sets simultaneously, so there's one pass through HAVE.   In addition, since training is a view (while test is a file), the data are not processed until training is submitted to proc logistic.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2593 views
  • 2 likes
  • 4 in conversation