Hi all,
I am trying to figure out whether it makes sense to use a PROC MIXED model for my analysis: The evolution of LVMI in healthy children.
Data background:
-two time points (repeated measurement among the same subjects)
-time point 1 was different for some children (time point 1 was not on the same date for each child - date was between March 2017 to July 2017).
-time point 2 was also different for some children (time point 2 was also not on the same date for each child - date was between April 2018 to June 2018).
-therefore, there are unequally spaced longitudinal measurements
My PI suggested a mixed model due to the varying months (times) that the measurements were taken among the children.
The idea was to have time in years (defined as time point 2 minus time point 1) in the REPEATED statement. However, my issue is that the two observations for each ID will then have identical values for the “timeinyrs” variable. See the attachment. This as a result creates a message "NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite estimated R matrix for Subject x" and no estimates are produced.
Therefore, I am not really sure where to go from here. If I have the "timeinyrs" variable account for only one observation of the two for each ID, it greatly reduces my sample.
Here is the code that was used:
*Looking at the evolution of LVMI. Predictors: timeinyrs, age, sex, height_cm. Outcome: LVMI_chinali;
PROC MIXED DATA=Echo_FINAL2 METHOD=REML order=Internal noclprint;
TITLE "Looking at the evolution of lvmi in the healthy kids dataset";
CLASS cat_timeinyrs sex;
MODEL LVMI_chinali= timeinyrs age sex height_cm / SOLUTION;
REPEATED cat_timeinyrs / SUBJECT=ID TYPE=SP(pow)(cat_timeinyrs);
RANDOM /*timeinyrs*/ int / sub=ID ;
RUN;
Any help would really be appreciated!
Thanks for your reply!
I ended up changing how my time variable was calculated (I subtracted each visit date from one baseline date). I then re-ran the model. However, there is a new note message (see below).
"NOTE: Convergence criteria met but final Hessian is not positive definite."
@Ksharp , do you have a recommendation or advice on what might be the issue with my model now? Any suggestions would be greatly appreciated!
PROC MIXED DATA=Echo_FINAL2 METHOD=REML order=Internal noclprint;
TITLE "Looking at the evolution of lvmi in the healthy kids dataset - ORIGINAL";
CLASS cat_timeindays sex;
MODEL LVMI_chinali= timeindays age sex height_cm / SOLUTION CL;
REPEATED cat_timeindays / SUBJECT=ID TYPE=SP(pow)(cat_timeindays);
RANDOM /*timeindays*/ int / sub=ID;
RUN;
You mentioned that this research is about progression of data in healthy children. The data sound observational with no treatment or experimental condition applied at baseline. Given this- Is the appropriate time variable actually the child's age at each visit?
If so- instead of subtracting the baseline date from each visit date, you would subtract the child's birth date from each visit date.
I recommend using a random coefficients model using a RANDOM statement instead of the REPEATED/RANDOM statements you have.
I see these possible issues with spatial power on the REPEATED statement as you have it.
1) Using spatial power forces both time points to have the same variance.
2) The reason you use spatial power is to allow the correlation between time points to be different based on how far apart they are. Using a categorical variable to represent time, like your code, probably isn't representing this distance correctly.
3) I'm also not sure of the benefit of adding the additional RANDOM statement with just a separate intercept. This is estimating a separate variance parameter.
This is the general code I would use:
PROC MIXED DATA=Echo_FINAL2 METHOD=REML;
MODEL LVMI_chinali= timeinyrs / SOLUTION;
RANDOM intercept timeinyrs / subject=id type=UN G V;
RUN;
This estimates a separate line for each subject and ultimately fits a separate variance at each time point, plus a covariance between the two time points.
Thanks for your message back! I applied your adjustment in the code. However then I get a new message:
"NOTE: Estimated G matrix is not positive definite."
Maybe the issue is stemming from the RANDOM statement?
PROC MIXED DATA=Echo_FINAL2 METHOD=REML order=Internal noclprint;
TITLE "Looking at the evolution of lvmi in the healthy kids dataset - REFINED";
CLASS sex;
MODEL LVMI_chinali= timeindays age sex height_cm / SOLUTION CL;
RANDOM intercept timeindays / subject=id type=UN G V;
RUN;
An alternative way is using GEE method:
https://support.sas.com/kb/46/997.html
proc genmod data=Echo_FINAL2; class id timeindays ; model LVMI_chinali = timeindays age sex height_cm; repeated subject=id; lsmeans timeindays / ilink cl e; run;
Hi again,
I think the issue has to do with your original question-- what to enter as the time values. It is not correct to use the same time value for both time points. I suggest you set time to 0 for the first visit and keep the time value as it is for the second visit. Another alternative is to create a date variable that equals the value of ECHO_DATE_1 for the first visit and equals ECHO_DATE_2 for the second visit. Then- you would use that date variable in the model in place of your time variable. Doing so would give you a slightly different interpretation, since the model is now taking into account the actual dates and their spacing, instead of just their spacing.
An alternative that I think is better than either of the above, though (which I mentioned in another post) is to use AGE as your time variable. Age would need to be calculated by subtracting the visit dates from birth date, so the age is a decimal, not rounded to a year. Age would also need to be calculated at both visits. This would lead to a really nice interpretation of your results, so that you can talk about how your outcome is changing for different ages of children, while simultaneously talking about the outcome over time.
Note- using age to measure time would not be the correct choice if this was an experimental study where an experimental treatment is applied at the first visit. In such a case, your primary interest is time from baseline. Your study sounds like it is purely observational, though.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.