Split sample and 3 fold cross validation logistic regression

Reply
Contributor
Posts: 34

Split sample and 3 fold cross validation logistic regression

Hi,

I am not advanced SAS user and I will need your help. I am working on a dataset with 6 predictors (3 contiuous, 3 categorical) with binary outcome. I am trying to do the followings:

 

1. Split the data into 50% training and 50% validating datasets and then compare their ROC curves, SAS code:

proc glmselect data=roc;
   partition fraction(validate=0.5);
   model ca=wt cons age race smoking BP / selection=forward;
   output out=outDataForward;
run;

proc glmselect data=outDataForward;
 model comp=wgtdx consult agedx race smk hyp/selection=backward;
 run;

 

2. Conduct 3-fold split of the sample and then compare the ROC and AUC curves for the 3 datasets (training, valdiating, testing), SAS code:

 

proc glmselect data=roc;
 partition fraction(test=0.25 validate=0.25);
 model comp=wgtdx consult agedx race smk hyp/selection=forward;
 output out=outDataForward;
 run;

 

I am sure my codes are not correct/complete and I will appreciate someone explains to me (step by step with details) how to split my dataset into two equal parts and 3-parts and then compare these models and obtain ROC and AUV values.

Thanks in advance.

SU



Contributor
Posts: 34

Re: Split sample and 3 fold cross validation logistic regression

Posted in reply to sas_user4

Please explain to me the code espcially if it is a MACRO so I can apply it to my dataset.

Thank you so much!

Super User
Posts: 10,020

Re: Split sample and 3 fold cross validation logistic regression

Posted in reply to sas_user4

PROC GLMSELECT can't be applied to logistic regression(binomial distribution),

Try PROC HPGENSELECT.

 

The following is 10 fold cross validation for 20 times for proc logistic. You can start with it.

 



%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);


/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;


Ask a Question
Discussion stats
  • 2 replies
  • 568 views
  • 0 likes
  • 2 in conversation