Hi all,
I need to combine 10-fold cross-validation and Bootstrapping in the same macro. But i have the impression that I am missing the Bootstrapping loop. I was wondering if you would be able to advise me what is wrong on my code below please? Is there a better way to combine 10-fold cross-validation and Bootstrapping in the same macro?
data kyphosis;
* infile cards dlm='09'x;
input y1 y2 d;
cards;
13 100 0
64 437 0
12 334 0
618 285 0
104 150 0
65 136 0
1573 523 0
291 927 0
84 62 0
13 54 0
338 248 1
758 917 1
189 305 1
260 88 1
223 257 1
604 231 1
366 1106 1
1094 658 1
176 65 1
1499 147 1
69 319 1
,
run;
%macro bootstrap (bootnum); *bootnum=12;
*******************************************************************************
BOOTSTRAPPING
*******************************************************************************;
proc surveyselect data=kyphosis NOPRINT seed=1234
out=kyphosis1(rename=(Replicate=bootsample))
method=urs
samprate=100
outhits
reps=&bootnum;
run;
*******************************************************************************
10-FOLD CROSS VALIDATION
*******************************************************************************;
data kyphosis1;
set kyphosis1;
theRandom = ranuni(0);
*by bootsample;
run;
proc rank data = kyphosis1 out=kRanked groups=10;
*by bootsample;
var theRandom;
run;
%do x = 0 %to 9;
******************************************************************************
Create Training Data
*******************************************************************************;
data training&x;
set kRanked;
where theRandom ne &x;
*by bootsample;
run;
******************************************************************************
Generate Binary variable beta1 for Biomarker 1
*******************************************************************************;
proc sql;
create table temp&x as
select
a.y1 as compare_y1,
b.*,
a.y1 < b.y1 as beta
from training&x as a, training&x as b;
*order by compare_y1, y1;
quit;
******************************************************************************
Generate Binary variable beta1 for Biomarker 2
*******************************************************************************;
proc sql;
create table temp2&x as
select
a.y2 as compare_y2,
b.*,
a.y2 < b.y2 as beta1
from training&x as a, training&x as b;
*order by compare_y2, y2;
quit;
******************************************************************************
Merge Temp and Temp2
*******************************************************************************;
data want&x;
merge temp&x temp2&x;
run;
******************************************************************************
Generate Combined Binary variable beta1_beta2 and Append
*******************************************************************************;
data final&x;
set want&x;
if beta=1 & beta1=1 then beta1_beta2=1;
if beta=1 & beta1=0 then beta1_beta2=1;
if beta=0 & beta1=1 then beta1_beta2=1;
if beta=0 & beta1=0 then beta1_beta2=0;
run;
******************************************************************************
Calculate se, spec, ect...
*******************************************************************************;
proc sort data=final&x;
by compare_y1 compare_y2;
run;
proc freq data = final&x order=data noprint ;
by compare_y1 compare_y2;
tables beta1_beta2*d / out=freq_results&x OUTPCT sparse ;
*output out=mnocol;
run;
******************************************************************************
Create Test Data
*******************************************************************************;
data test&x; * The test dataset, called test0 (when x=0), test1 (when x=1), etc.;
set kRanked;
where theRandom = &x;
run;
******************************************************************************
APPEND THE RESULTS OF THE SELECTED CUTOFFS IN THE TRAING SET
*******************************************************************************;
proc append base = Base force data = freq_results&x;
run;
%end; *end 10-FOLD CROSS VALIDATION loop;
%mend;
%bootstrap(12);
Why not do a 2-fold, generate a log, and see what is missing from the log, if anything (using OPTIONS MPRINT; prior to invoking the macro)? If you find something missing, show us what you expected in the log, vs what you got. It'll be a long log, so you might want to mask out the parts remote from any apparent problem area.
Or if the resulting data is different than what you expect, then again provide what you expected vs what you got.
Diagnosis benefits from observation of results, as well as inference from the program logic.
Help us help you.
@Rick_SAShas written some good posts on this subject
https://blogs.sas.com/content/iml/tag/bootstrap-and-resampling
I was missing bootstrapping loop below.The program is working now. Thanks for letting me think:)
%do i = 1 %to &bootnum;
%end; *end Boostraping;
Here is the code I wrote before for 10-FOLD CROSS VALIDATION Logistic Regression.
/****** K-Fold CV ****/
%macro k_fold_cv(k=10);
ods select none;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score&i;
merge true native est;
retain id &i ;
optimism=native-true;
run;
%end;
data k_fold_cv;
set score1-score&k;
run;
ods select all;
%mend;
%k_fold_cv(k=10)
/*************************************/
%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score_r&r._&i;
merge true native est;
retain rep &r id &i;
optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
set score_r:;
run;
ods select all;
%mend;
%k_fold_cv_rep(r=20,k=10);
/********************/
data all;
set k_fold_cv k_fold_cv_rep indsname=indsn;
length indsname $ 32;
indsname=indsn;
run;
proc summary data=all nway;
class indsname;
var optimism;
output out=want mean=mean lclm=lclm uclm=uclm;
run;
In the case of large datasets, where efficiency becomes an issue, instead of:
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
consider using
data training test / view=training;
set have;
if groupid=&i then output test;
else output training;
run;
It creates both data sets simultaneously, so there's one pass through HAVE. In addition, since training is a view (while test is a file), the data are not processed until training is submitted to proc logistic.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.