BookmarkSubscribeRSS Feed
viollete
Calcite | Level 5

Hello,

 

I have this type of longitudinal data (I have over 130 hospitals in my data set):

 

Hospital_id  price ...

1   56   

1   75  

1   45  

1   74

2  52

2  57

2  49

2  75

3  34

3  45

3  56

.........

 

I want to do leave one out cross validation. Something like this:

 

1. split data into train (hospitals 2 and 3) and test (hospital 1).

2. do analysis on train .

3. when i want to split data again into train (hospitals 1 and 3) and test (hospital 2).

 and so on...

 

How automatically to do data splitting?

 

Thanks

2 REPLIES 2
Ksharp
Super User

@Rick_SAS  wrote a blog about it just  a couple of days ago.

I would use proc surveyselect .........

 

 

proc freq data=sashelp.class noprint;
table name/out=key;
run;
data _train _test;
 set key;
 if rand('bern',0.7) then output _train;
 else output _test;
run;
proc sql;
create table train as
 select * from sashelp.class where name in (select name from _train);
 
 
create table test as
 select * from sashelp.class where name in (select name from _test);
quit;
Leave one out CV would be like something: (using KEY table above + CALL EXECUTE the following code )
proc sql;
create table train as
 select * from sashelp.class where name = 'xxxxxxxx'
 
 
create table test as
 select * from sashelp.class where name not = 'xxxxxxxx';
quit;
Kurt_Bremser
Super User

Something like this?

/* create the base data */
data have;
input hosp_id price;
cards;
1 56
1 75
1 45
1 74
2 52
2 57
2 49
2 75
3 34
3 45
3 56
;
run;

/* extract distinct id's */
proc sort
  data=have (keep=hosp_id)
  out=exclusions
  nodupkey
;
by hosp_id;
run;

/* a macro to wrap all the analysis code in, and the split */
%macro analysis(hosp_id);

data
  train
  validate
;
set have;
if hosp_id = &hosp_id
then output validate;
else output train;
run;

/* training and check against validate goes here */

%mend;

/* call the macro repeatedly from the distinct id's */
data _null_;
set exclusions;
call execute('%nrstr(%analysis(' !! put(hosp_id,best.) !! '));');
run;

When you run the code, you can see in the log that three different sets of train/validate data are created.

Make sure that each individual call of the macro creates a separate set of result datasets, or you will only get the result of the last iteration.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1353 views
  • 0 likes
  • 3 in conversation