Statistical programming, matrix languages, and more

How to do k-fold CV with replacements/replication

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

How to do k-fold CV with replacements/replication

Hi, how to do, say 10-fold cross validation with resampling with proc logistic (which apparently is as good as bootstrap with replacements, "..We also carried out cross-validation with replication. Here the cross-validation was replicated r times, with a different random split into k groups each time..." - this is the article, http://m.aje.oxfordjournals.org/content/early/2014/06/24/aje.kwu140.full.pdf ). Help most appreciated! 

 

p/s sorry if i hv the terminologies jumbled up; and no access to hp-procs

 

Thanks,

Saiful.


Accepted Solutions
Solution
‎10-17-2016 08:08 AM
Super User
Posts: 10,686

Re: How to do k-fold CV with replacements/replication

Posted in reply to Dcicantab5

It looks like very easy.

 

%macro k_fold_cv(k=10);
ods select none;

proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score&i;
 merge true native est;
 retain id &i ;
 optimism=native-true;
run;
%end;
data k_fold_cv;
 set score1-score&k;
run;

ods select all;
%mend;

%k_fold_cv(k=10)








/*************************************/


%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);


/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;

View solution in original post


All Replies
Super User
Posts: 23,262

Re: How to do k-fold CV with replacements/replication

[ Edited ]
Posted in reply to Dcicantab5

How big is your data? 

 

The methods in this paper are what you're looking for. Essentially, use PROC SURVEYSELECT to generate random samples, run PROC LOGISTIC on the samples using a BY group and then summarize results using PROC SURVEYMEANS OR MEANS. 

 

 

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

 

Edit: Realized this is IML so feel free to disregard this message if it's irrelevant, but this would be a perfectly valid way to approach your problem. PS I would find a worked example and work through it to verify that you understand your calculations thoroughly. I once spent 3 days debugging a bootstrap because I didn't realize the denominator was n-1 vs n....

 

Contributor
Posts: 22

Re: How to do k-fold CV with replacements/replication

You? 3 days? No...
Ok, anyway, not how big but rather, how small, N=199 with events=20; will look into the referred paper, thanks
Solution
‎10-17-2016 08:08 AM
Super User
Posts: 10,686

Re: How to do k-fold CV with replacements/replication

Posted in reply to Dcicantab5

It looks like very easy.

 

%macro k_fold_cv(k=10);
ods select none;

proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score&i;
 merge true native est;
 retain id &i ;
 optimism=native-true;
run;
%end;
data k_fold_cv;
 set score1-score&k;
run;

ods select all;
%mend;

%k_fold_cv(k=10)








/*************************************/


%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);


/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 753 views
  • 3 likes
  • 3 in conversation