Hi,
Can anyone please suggest a way how can I randomly split the longitudinal data into training (60%) and validation (40%).
In my case, I'd like split on a data set where each individual has more than one observation, in such a way that if an individual is in one of the training/validation sets, then all of their observations are in that same set.
Example data (BMILONG) below:
I want to split on BMILONG dataset generated in the second step.
DATA BMI;
CALL STREAMINIT(12345);
DO ID = 1 TO 100;
GENDER=(MOD(ID,2)=0);
TREAT=( ID>50);
BASELINE = ROUND(RAND('NORMAL',35,2),.1);
IF GENDER=1 AND TREAT=0 THEN DO;
GROUP = 'FEMALE - PLACEBO';
MONTH3 = ROUND(BASELINE - .25 + RAND('NORMAL',0,1),.1);
MONTH6 = ROUND(MONTH3 + .25 + RAND('NORMAL',0,1),.1);
MONTH9 = ROUND(MONTH6 - .25 + RAND('NORMAL',0,1),.1);
MONTH12= ROUND(MONTH9 + .25 + RAND('NORMAL',0,1),.1);
END;
IF GENDER=0 AND TREAT=0 THEN DO;
GROUP = 'MALE - PLACEBO';
MONTH3 = ROUND(BASELINE - 1 + RAND('NORMAL',0,1),.1);
MONTH6 = ROUND(MONTH3 - 1 + RAND('NORMAL',0,1),.1);
MONTH9 = ROUND(MONTH6 + 1 + RAND('NORMAL',0,1),.1);
MONTH12= ROUND(MONTH9 + 1 + RAND('NORMAL',0,1),.1);
END;
IF GENDER=0 AND TREAT=1 THEN DO;
GROUP = 'MALE - TREAT';
MONTH3 = ROUND(BASELINE - 1.5 + RAND('NORMAL',0,1),.1);
MONTH6 = ROUND(MONTH3 - 1.5 + RAND('NORMAL',0,1),.1);
MONTH9 = ROUND(MONTH6 - 1.5 + RAND('NORMAL',0,1),.1);
MONTH12= ROUND(MONTH9 - 1.5 + RAND('NORMAL',0,1),.1);
END;
IF GENDER=1 AND TREAT=1 THEN DO;
GROUP = 'FEMALE - TREAT';
MONTH3 = ROUND(BASELINE - 1 + RAND('NORMAL',0,1),.1);
MONTH6 = ROUND(MONTH3 - 1 + RAND('NORMAL',0,1),.1);
MONTH9 = ROUND(MONTH6 - 1 + RAND('NORMAL',0,1),.1);
MONTH12= ROUND(MONTH9 - 1 + RAND('NORMAL',0,1),.1);
END;
OUTPUT;
END;
RUN;
DATA BMILONG;
SET BMI;
TIMEPT=0; BMI=BASELINE; OUTPUT;
TIMEPT=3; BMI=MONTH3; OUTPUT;
TIMEPT=6; BMI=MONTH6; OUTPUT;
TIMEPT=9; BMI=MONTH9; OUTPUT;
TIMEPT=12; BMI=MONTH12; OUTPUT;
DROP BASELINE MONTH:;
RUN;
Thanks for a well presented problem. Here is a solution using surveyselect:
proc surveyselect data=bmiLong seed=8588 samprate=40 outall
out=bmiLongGroups(rename=selected=validation) ;
cluster id;
run;
Thanks for a well presented problem. Here is a solution using surveyselect:
proc surveyselect data=bmiLong seed=8588 samprate=40 outall
out=bmiLongGroups(rename=selected=validation) ;
cluster id;
run;
This works perfect!! Appreciate your help.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.