- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have this type of longitudinal data (I have over 130 hospitals in my data set):
Hospital_id price ...
1 56
1 75
1 45
1 74
2 52
2 57
2 49
2 75
3 34
3 45
3 56
.........
I want to do leave one out cross validation. Something like this:
1. split data into train (hospitals 2 and 3) and test (hospital 1).
2. do analysis on train .
3. when i want to split data again into train (hospitals 1 and 3) and test (hospital 2).
and so on...
How automatically to do data splitting?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Rick_SAS wrote a blog about it just a couple of days ago.
I would use proc surveyselect .........
proc freq data=sashelp.class noprint;
table name/out=key;
run;
data _train _test;
set key;
if rand('bern',0.7) then output _train;
else output _test;
run;
proc sql;
create table train as
select * from sashelp.class where name in (select name from _train);
create table test as
select * from sashelp.class where name in (select name from _test);
quit;
Leave one out CV would be like something: (using KEY table above + CALL EXECUTE the following code )
proc sql;
create table train as
select * from sashelp.class where name = 'xxxxxxxx'
create table test as
select * from sashelp.class where name not = 'xxxxxxxx';
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like this?
/* create the base data */
data have;
input hosp_id price;
cards;
1 56
1 75
1 45
1 74
2 52
2 57
2 49
2 75
3 34
3 45
3 56
;
run;
/* extract distinct id's */
proc sort
data=have (keep=hosp_id)
out=exclusions
nodupkey
;
by hosp_id;
run;
/* a macro to wrap all the analysis code in, and the split */
%macro analysis(hosp_id);
data
train
validate
;
set have;
if hosp_id = &hosp_id
then output validate;
else output train;
run;
/* training and check against validate goes here */
%mend;
/* call the macro repeatedly from the distinct id's */
data _null_;
set exclusions;
call execute('%nrstr(%analysis(' !! put(hosp_id,best.) !! '));');
run;
When you run the code, you can see in the log that three different sets of train/validate data are created.
Make sure that each individual call of the macro creates a separate set of result datasets, or you will only get the result of the last iteration.