Hi, how to do, say 10-fold cross validation with resampling with proc logistic (which apparently is as good as bootstrap with replacements, "..We also carried out cross-validation with replication. Here the cross-validation was replicated r times, with a different random split into k groups each time..." - this is the article, http://m.aje.oxfordjournals.org/content/early/2014/06/24/aje.kwu140.full.pdf ). Help most appreciated!
p/s sorry if i hv the terminologies jumbled up; and no access to hp-procs
Thanks,
Saiful.
It looks like very easy.
%macro k_fold_cv(k=10);
ods select none;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score&i;
merge true native est;
retain id &i ;
optimism=native-true;
run;
%end;
data k_fold_cv;
set score1-score&k;
run;
ods select all;
%mend;
%k_fold_cv(k=10)
/*************************************/
%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score_r&r._&i;
merge true native est;
retain rep &r id &i;
optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
set score_r:;
run;
ods select all;
%mend;
%k_fold_cv_rep(r=20,k=10);
/********************/
data all;
set k_fold_cv k_fold_cv_rep indsname=indsn;
length indsname $ 32;
indsname=indsn;
run;
proc summary data=all nway;
class indsname;
var optimism;
output out=want mean=mean lclm=lclm uclm=uclm;
run;
How big is your data?
The methods in this paper are what you're looking for. Essentially, use PROC SURVEYSELECT to generate random samples, run PROC LOGISTIC on the samples using a BY group and then summarize results using PROC SURVEYMEANS OR MEANS.
http://www2.sas.com/proceedings/forum2007/183-2007.pdf
Edit: Realized this is IML so feel free to disregard this message if it's irrelevant, but this would be a perfectly valid way to approach your problem. PS I would find a worked example and work through it to verify that you understand your calculations thoroughly. I once spent 3 days debugging a bootstrap because I didn't realize the denominator was n-1 vs n....
It looks like very easy.
%macro k_fold_cv(k=10);
ods select none;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score&i;
merge true native est;
retain id &i ;
optimism=native-true;
run;
%end;
data k_fold_cv;
set score1-score&k;
run;
ods select all;
%mend;
%k_fold_cv(k=10)
/*************************************/
%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
ods output
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
outest=est(keep=_status_ _name_) ;
class sex;
model status(event='Alive')=sex height weight;
score data=test fitstat;
run;
data score_r&r._&i;
merge true native est;
retain rep &r id &i;
optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
set score_r:;
run;
ods select all;
%mend;
%k_fold_cv_rep(r=20,k=10);
/********************/
data all;
set k_fold_cv k_fold_cv_rep indsname=indsn;
length indsname $ 32;
indsname=indsn;
run;
proc summary data=all nway;
class indsname;
var optimism;
output out=want mean=mean lclm=lclm uclm=uclm;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.