Hello all I hope this message finds you well. Apologies for the long question. I have never analyzed a longitudinal dataset before. I am going through materials from classes done years ago (and materials online), and I am still a bit confused, and would like some help to make sure I am approaching this problem from the right perspective. Any help would be greatly appreciated. I have a large dataset which looks at size change over time calculated from images. Below I am including fictitious data for 5 patients which reflect the structure of the overall dataset. ID sex Size race image_occasion time timesq 1 0 20 2 1 0 0 1 0 23 2 2 13 169 1 0 12 2 3 15 225 1 0 22 2 4 18 324 1 0 25 2 5 29 841 1 0 24 2 6 104 10816 1 0 25 2 7 112 12544 1 0 28 2 8 117 13689 1 0 33 2 9 118 13924 2 1 20 1 1 0 0 2 1 26 1 2 3 9 2 1 26 1 3 9 81 2 1 28 1 4 15 225 2 1 33 1 5 21 441 2 1 29 1 6 27 729 2 1 31 1 7 37 1369 2 1 35 1 8 43 1849 2 1 27 1 9 57 3249 3 1 20 1 1 0 0 3 1 15 1 2 29 841 3 1 12 1 3 62 3844 4 0 22 2 1 0 0 4 0 23 2 2 1 1 4 0 40 2 3 4 16 4 0 18 2 4 6 36 5 1 17 2 2 0 0 5 1 23 2 4 35 1225 The meaning of the variables (except the ones that are self-explanatory) are: image_occasion: denotes the different occasions when different images were done on the same individual. Each of these images measured the size time: is the time of each image (thus each size measurement) from baseline, measured in months. Time=0 is the baseline (first) image (measurement) timesq: is simply time*time I would like to model the change in size over time with repeated measurements, adjusted for other baseline variables and then plot a graph to show this. This dataset is clearly unbalanced, because each patient has had measurement at different times from baseline, and each patient has had a different number of images/measurements. For this reason, my understanding is that the best approach to model it is to used a Random Effects Linear Mixed Effects Model with PROC MIXED. I have a few questions, if you can help me: QUESTION 1: Should this be a “RANDOM intercept” or a “RANDOM intercept time” model? Thus, should I have only random intercepts, or random intercepts and slopes? Should it be: Proc mixed data=mydata; Class id image_occasion sex race; Model size= time / s chisq; Random intercept / type =un subject=record_id ; run; or Proc mixed data=mydata; Class id image_occasion sex race; Model size= time / s chisq; Random intercept time / type =un subject=record_id ; run; I think I should use “RANDOM intercept.” With this model I am assuming that even though each patients starts at a different “intercept” (different size) their growth over time is roughly similar. Is this correct? Of course, when I add other variables to the model (for example sex and race) and create a multivariable model, the interpretation becomes a bit more complex, but broadly speaking that is the meaning, right? In this case, I can have a summary result for the population (fixed effects). Is this right? If instead, I build the “RANDOM intercept time” model, then I am assuming that even the slope of each individual is different in time. In that case, it would be more difficult to have a summary result for the population (fixed effects). Is this right? I am not asking now about the covariance model. I was planning on choosing between the different options based on the AIC value once I choose the correct model for the mean from above. QUESTION 2: How can I plot the results above, namely the change of size over time from the Proc Mixed regression? First, I would like to have only the mean change of size over time for the whole dataset (crude and multivariable). Then, I will do subgroup analysis in which I stratify for example by sex or other variables. I am using the option “outpm=output_results;” and then I am using “proc sgplot” but I am not sure about the validity of the results (it gives me a very straight line, which I am not sure reflects the data). In addition, when I do a multivariable model, this method does not work, because it gives me results for each individual patient, or at least gives many many different lines, which I don’t understand what exactly they are. I have also tried the option of doing: “store output_results2;” at the end of the Proc Mixed command, and then use the following: proc plm restore= output_results2; effectplot fit(x=time); run; This seems to work better, but still I am not convinced it is the right approach. Can you please help me determine which would be the best approach to use in this case? I have spent two weeks trying to figure this out. QUESTION 3. What if when I put the “timesq” variable in the model, that is significant (p<0.05)? It would suggest that size change over time is not linear, but quadratic (unless my model above is misspecified), right? In that case, does it change anything with regard to coding of the two questions above? In particular, how to plot a graph that shows this? Below is an image that I took from another paper that looked at a similar outcome. They do not mention that growth was quadratic in time. Rather, they simply say that they used a linear mixed model and that this graph comes from “line plot of overall estimated marginal mean of maximum diameter across time.” I am not sure how they produced this image, but my guess is that my data should look something like this. I am not being able to produce this image, or something similar to it. Any input you might have would be enormously appreciated! Thank you very much
... View more