BookmarkSubscribeRSS Feed
viollete
Calcite | Level 5

Hello,

 

I have this type of longitudinal data (I have over 130 hospitals in my data set):

 

Hospital_id  price ...

1   56   

1   75  

1   45  

1   74

2  52

2  57

2  49

2  75

3  34

3  45

3  56

.........

 

I want to do leave one out cross validation. Something like this:

 

1. split data into train (hospitals 2 and 3) and test (hospital 1).

2. do analysis on train .

3. when i want to split data again into train (hospitals 1 and 3) and test (hospital 2).

 and so on...

 

How automatically to do data splitting?

 

Thanks

2 REPLIES 2
Ksharp
Super User

@Rick_SAS  wrote a blog about it just  a couple of days ago.

I would use proc surveyselect .........

 

 

proc freq data=sashelp.class noprint;
table name/out=key;
run;
data _train _test;
 set key;
 if rand('bern',0.7) then output _train;
 else output _test;
run;
proc sql;
create table train as
 select * from sashelp.class where name in (select name from _train);
 
 
create table test as
 select * from sashelp.class where name in (select name from _test);
quit;
Leave one out CV would be like something: (using KEY table above + CALL EXECUTE the following code )
proc sql;
create table train as
 select * from sashelp.class where name = 'xxxxxxxx'
 
 
create table test as
 select * from sashelp.class where name not = 'xxxxxxxx';
quit;
Kurt_Bremser
Super User

Something like this?

/* create the base data */
data have;
input hosp_id price;
cards;
1 56
1 75
1 45
1 74
2 52
2 57
2 49
2 75
3 34
3 45
3 56
;
run;

/* extract distinct id's */
proc sort
  data=have (keep=hosp_id)
  out=exclusions
  nodupkey
;
by hosp_id;
run;

/* a macro to wrap all the analysis code in, and the split */
%macro analysis(hosp_id);

data
  train
  validate
;
set have;
if hosp_id = &hosp_id
then output validate;
else output train;
run;

/* training and check against validate goes here */

%mend;

/* call the macro repeatedly from the distinct id's */
data _null_;
set exclusions;
call execute('%nrstr(%analysis(' !! put(hosp_id,best.) !! '));');
run;

When you run the code, you can see in the log that three different sets of train/validate data are created.

Make sure that each individual call of the macro creates a separate set of result datasets, or you will only get the result of the last iteration.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1340 views
  • 0 likes
  • 3 in conversation