Hi,
Can anyone please suggest a way how can I randomly split the longitudinal data into training (60%) and validation (40%).
In my case, I'd like split on a data set where each individual has more than one observation, in such a way that if an individual is in one of the training/validation sets, then all of their observations are in that same set.
Example data (BMILONG) below:
I want to split on BMILONG dataset generated in the second step.
DATA BMI; CALL STREAMINIT(12345); DO ID = 1 TO 100; GENDER=(MOD(ID,2)=0); TREAT=( ID>50); BASELINE = ROUND(RAND('NORMAL',35,2),.1); IF GENDER=1 AND TREAT=0 THEN DO; GROUP = 'FEMALE - PLACEBO'; MONTH3 = ROUND(BASELINE - .25 + RAND('NORMAL',0,1),.1); MONTH6 = ROUND(MONTH3 + .25 + RAND('NORMAL',0,1),.1); MONTH9 = ROUND(MONTH6 - .25 + RAND('NORMAL',0,1),.1); MONTH12= ROUND(MONTH9 + .25 + RAND('NORMAL',0,1),.1); END; IF GENDER=0 AND TREAT=0 THEN DO; GROUP = 'MALE - PLACEBO'; MONTH3 = ROUND(BASELINE - 1 + RAND('NORMAL',0,1),.1); MONTH6 = ROUND(MONTH3 - 1 + RAND('NORMAL',0,1),.1); MONTH9 = ROUND(MONTH6 + 1 + RAND('NORMAL',0,1),.1); MONTH12= ROUND(MONTH9 + 1 + RAND('NORMAL',0,1),.1); END; IF GENDER=0 AND TREAT=1 THEN DO; GROUP = 'MALE - TREAT'; MONTH3 = ROUND(BASELINE - 1.5 + RAND('NORMAL',0,1),.1); MONTH6 = ROUND(MONTH3 - 1.5 + RAND('NORMAL',0,1),.1); MONTH9 = ROUND(MONTH6 - 1.5 + RAND('NORMAL',0,1),.1); MONTH12= ROUND(MONTH9 - 1.5 + RAND('NORMAL',0,1),.1); END; IF GENDER=1 AND TREAT=1 THEN DO; GROUP = 'FEMALE - TREAT'; MONTH3 = ROUND(BASELINE - 1 + RAND('NORMAL',0,1),.1); MONTH6 = ROUND(MONTH3 - 1 + RAND('NORMAL',0,1),.1); MONTH9 = ROUND(MONTH6 - 1 + RAND('NORMAL',0,1),.1); MONTH12= ROUND(MONTH9 - 1 + RAND('NORMAL',0,1),.1); END; OUTPUT; END; RUN;
DATA BMILONG; SET BMI; TIMEPT=0; BMI=BASELINE; OUTPUT; TIMEPT=3; BMI=MONTH3; OUTPUT; TIMEPT=6; BMI=MONTH6; OUTPUT; TIMEPT=9; BMI=MONTH9; OUTPUT; TIMEPT=12; BMI=MONTH12; OUTPUT; DROP BASELINE MONTH:; RUN;
... View more