BookmarkSubscribeRSS Feed
sas_user4
Obsidian | Level 7

Hi,

I am not advanced SAS user and I will need your help. I am working on a dataset with 6 predictors (3 contiuous, 3 categorical) with binary outcome. I am trying to do the followings:

 

1. Split the data into 50% training and 50% validating datasets and then compare their ROC curves, SAS code:

proc glmselect data=roc;
   partition fraction(validate=0.5);
   model ca=wt cons age race smoking BP / selection=forward;
   output out=outDataForward;
run;

proc glmselect data=outDataForward;
 model comp=wgtdx consult agedx race smk hyp/selection=backward;
 run;

 

2. Conduct 3-fold split of the sample and then compare the ROC and AUC curves for the 3 datasets (training, valdiating, testing), SAS code:

 

proc glmselect data=roc;
 partition fraction(test=0.25 validate=0.25);
 model comp=wgtdx consult agedx race smk hyp/selection=forward;
 output out=outDataForward;
 run;

 

I am sure my codes are not correct/complete and I will appreciate someone explains to me (step by step with details) how to split my dataset into two equal parts and 3-parts and then compare these models and obtain ROC and AUV values.

Thanks in advance.

SU



2 REPLIES 2
sas_user4
Obsidian | Level 7

Please explain to me the code espcially if it is a MACRO so I can apply it to my dataset.

Thank you so much!

Ksharp
Super User

PROC GLMSELECT can't be applied to logistic regression(binomial distribution),

Try PROC HPGENSELECT.

 

The following is 10 fold cross validation for 20 times for proc logistic. You can start with it.

 



%macro k_fold_cv_rep(r=1,k=10);
ods select none;
%do r=1 %to &r;
proc surveyselect data=sashelp.heart group=&k out=have;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

ods output 
Association=native(keep=label2 nvalue2 rename=(nvalue2=native) where=(label2='c'))
ScoreFitStat=true(keep=dataset freq auc rename=(auc=true));
proc logistic data=training
 outest=est(keep=_status_ _name_) ;
 class sex;
 model status(event='Alive')=sex height weight;
 score data=test fitstat; 
run;

data score_r&r._&i;
 merge true native est;
 retain rep &r id &i;
 optimism=native-true;
run;
%end;
%end;
data k_fold_cv_rep;
 set score_r:;
run;

ods select all;
%mend;

%k_fold_cv_rep(r=20,k=10);


/********************/
data all;
 set k_fold_cv k_fold_cv_rep indsname=indsn;
 length indsname $ 32;
 indsname=indsn;
run;
proc summary data=all nway;
 class indsname;
 var optimism;
 output out=want mean=mean lclm=lclm uclm=uclm;
run;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 4991 views
  • 0 likes
  • 2 in conversation