Please help me. I don’t know how I should set up my data.
This problem involves a longitudinal dataset with up to 20 measurements per subject.
My outcome variable is the difference between “variable Y” at time 20 and “variable Y” at time 1. My exposure of interest does not vary with time (i.e.: sex), but some of the covariates to be included in the model do change over time (i.e.: stress and time).
I have used a dataset with multiple lines per subject (one line per measurement period). In this dataset, I have created an outcome variable (change_in_Y) that represents the difference between “variable Y” at time 20 and “variable Y” at time 1. Therefore, for a given subject, the value of this variable does not change from one line to another.
This doesn’t seem right to me. How should I rearrange my dataset or outcome variable?
Here is an example of the syntax I am using:
PROC GENMOD data=A;
CLASS id ;
MODEL change_in_Y = gender stress time /dist=normal;
REPEATED sub=id/type=CS corrw;
Off hand -- and I'd need to know much more about your research problem before I'd strongly recommend this approach, I suspect that your data set might be fine. I wonder, however, whether you'd be better off modeling Y -- not the gross aggregate change in Y from T1 through T20.
Also, how severe are your missing data problems. Could you formulate this question any differently if you restricted the analysis to those with complete data on all 20 periods of measurement? That would give you the opportunity to look at this as a Markov process.