Solved: Re: PROC GLIMMIX for longitudinal data

Misaki60 · Posted 06-15-2021 04:49 PM

Hello!

I want to analyze longitudinal data and use PROC GLIMMIX to compare values at time=0 with time=1 to 4. My goal is to estimate the mean in different time stamps while considering intra-subject-correlation. Here's what I came up with so far:

PROC GLIMMIX data=data.mixed;
CLASS id group time (ref="0") age_cat (ref="1") smoker_cat(ref="3");
MODEL eq5d5l = time age_cat smoker_cat /dist=beta alpha=0.05 solution cl;
LSMEANS time / cl ilink;
RANDOM intercept / subject=id;
store data.eq5d5l_mixed;
run; quit;

Is the code correct? In addition, I get the notification "Error: Invalid or missing data".

       ods noproctitle;
 75         ods graphics / imagemap=on;
 76         
 77         PROC GLIMMIX data=epra;
 78         CLASS erh_nr Gruppe time (ref="0") alter_cat (ref="1") raucher_cat(ref="3");
 79         MODEL aqlq= time alter_cat raucher_cat /dist=beta alpha=0.05 solution cl;
 80         LSMEANS time / cl ilink;
 81         RANDOM intercept / subject=erh_nr;
 82         store daten.aqlq_mixed;
 83         /* WEIGHT sw type=vc; */
 84         run;
 
 
 84       !      quit;
 
 
 ERROR: Invalid or missing data.
 NOTE: The GLIMMIX procedure deleted the model item store DATEN.AQLQ_MIXED because of incomplete information for a subsequent 
       analysis.
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE:  Verwendet wurde: PROZEDUR GLIMMIX - (Gesamtverarbeitungszeit):
       real time           0.10 seconds
       cpu time            0.09 seconds

Has anyone an idea why I get this notification? In case you're wondering, It's my first time with SAS so please be considerate. Thanks!

SteveDenham · Posted 06-16-2021 09:00 AM

The first code block looks fine. You may want to explore interactions later, but for now the only thing I would think about changing would be your LSMEANS statement to

LSMEANS time / cl ilink diff=control;

That will enable testing the various levels of time against the first level (time=0). If your subjects are measured at multiple times, you will probably want to include this RANDOM statement:

random time/subject=id type=ar(1) residual;

to model the correlation of residuals (R side).

On to the second block of code. Here PROC GLIMMIX is not finding your dataset. Be sure all the spelling matches up and the dataset exists. That seems really obvious, but sometimes the obvious is what trips us up. Try running PROC FREQ to get the cross tabulations on the dataset.

Once you get the dataset issue settled, then there are going to be some issues in running the code. The variable 'time' should probably be fit with code analogous to the second box above. However, in this case, you will need to make sure that each id has only a single measure at each time point. I don't know where the variable 'Gruppe' fits in this, but I would consider changing the subject=erh_nr to subject=erh_nr(Gruppe), for both random statements.

Finally, note that the differences between values and the associated significances are calculated on the logit scale (canonical link for a beta distribution). The ilink option will enable you to get lsmeans on the original scale, but the difference on the original scale is not produced. What comes out is the backtransformed difference on the logit scale. You can get the difference easily using a data step, but the standard error of the difference on the original scale will take more work.

You'll need the %NLmeans macro. See various posts in the Analytics>Statistical Procedures that address this, particularly those from @StatDave .

SteveDenham

View solution in original post

ballardw · Posted 06-15-2021 06:11 PM

When you get an error, copy the log of the submitted code and all of the messages, notes, warnings or errors.

Paste the copied text into a text box opened on the forum with the </>.

Quite often those errors include diagnostic characters or additional information for why is provided by notes.

Misaki60 · Posted 06-15-2021 06:36 PM

Thank you for your tips, I already copied them in the text above.

SteveDenham · Posted 06-16-2021 09:00 AM

The first code block looks fine. You may want to explore interactions later, but for now the only thing I would think about changing would be your LSMEANS statement to

LSMEANS time / cl ilink diff=control;

That will enable testing the various levels of time against the first level (time=0). If your subjects are measured at multiple times, you will probably want to include this RANDOM statement:

random time/subject=id type=ar(1) residual;

to model the correlation of residuals (R side).

On to the second block of code. Here PROC GLIMMIX is not finding your dataset. Be sure all the spelling matches up and the dataset exists. That seems really obvious, but sometimes the obvious is what trips us up. Try running PROC FREQ to get the cross tabulations on the dataset.

Once you get the dataset issue settled, then there are going to be some issues in running the code. The variable 'time' should probably be fit with code analogous to the second box above. However, in this case, you will need to make sure that each id has only a single measure at each time point. I don't know where the variable 'Gruppe' fits in this, but I would consider changing the subject=erh_nr to subject=erh_nr(Gruppe), for both random statements.

Finally, note that the differences between values and the associated significances are calculated on the logit scale (canonical link for a beta distribution). The ilink option will enable you to get lsmeans on the original scale, but the difference on the original scale is not produced. What comes out is the backtransformed difference on the logit scale. You can get the difference easily using a data step, but the standard error of the difference on the original scale will take more work.

You'll need the %NLmeans macro. See various posts in the Analytics>Statistical Procedures that address this, particularly those from @StatDave .

SteveDenham

Misaki60 · Posted 06-16-2021 10:43 AM

Dear SteveDenham,

first of all, thank you for your detailed answer and your insight. Instead of "group", I added (time) in the brackets to compare the different means of the IDs between all time stamps.

After adding your suggestions and a few specifications, my code looks like this:

*generic QoL
***************;	
PROC GLIMMIX data=daten.mixed;
CLASS id group time (ref="0") sex (ref="1") age_cat (ref="1") smoker_cat(ref="3") ACT_ges (ref="5");
MODEL eq5d5l = time group sex age_cat smoker_cat ACT_ges /dist=beta link=logit alpha=0.05 solution cl;
LSMEANS time / cl ilink diff=control;
RANDOM intercept / subject=id (time) type=ar(1) residual;
store daten.eq5d5l_mixed;
run; quit;
/*  */
/* %NLMeans(instore=daten.eq5d5l_mixed, coef=coeffs, link=logit, title=Difference of HRQoL); */

*total costs
***************;
*Assign a small amount to consider in the gamma model;
data daten.mixed_costs; set daten.mixed;
	if costs=0	and cc=1 then do; costs	=1; end;
	run;

*Mixed Model;
PROC GLIMMIX data=daten.mixed_costs;
CLASS id group time (ref="0") sex (ref="1") age_cat (ref="1") smoker_cat(ref="3") employment (ref="1") ACT_ges (ref="5");
MODEL total costs = group time sex age_cat smoker_cat FIM4_t0 ACT_ges /dist=gamma link=log alpha=0.05 solution cl;
LSMEANS time / cl ilink diff=control;
RANDOM intercept / subject=id(time) type=ar(1) residual;
store daten.cost_mixed;
run; quit;

Model EQ5D: After adapting these changes, to log-Output says:

 NOTE: Did not converge.

Do you have an idea what's wrong?

Model Costs: Unfortunately, the model I still not working, maybe it has something to do with the estimation method. How do I know which method is appropriate?

Thanks in advance and have a great day!

Maria

SteveDenham · Posted 06-21-2021 11:22 AM

One or two things. First, this statement

RANDOM intercept / subject=id (time) type=ar(1) residual;

doesn't quite make sense to me, as I don't see how the intercepts are correlated. I would suggest a combined R side and G side representation, where the G side models a random intercept model by subject and the R side models a correlation of residuals over time:

RANDOM intercept / subject=id;
RANDOM time/subject=id type=ar(1) residual;

Regarding convergence, I believe that you might be up against the default limit of 20 iterations. To get around that limit, use an NLOPTIONS statement:

NLOPTIONS maxiter=1000;

If there are still convergence issues, see the paper by Kiernan et al. https://support.sas.com/resources/papers/proceedings12/332-2012.pdf for additional tips.

On the model costs, what do you mean by "not working". What messages are in the log or in the output? These should give you an idea of where to start. For me the suspicion falls on the specification of the random effects, and if not there then in the updating of costs in the data set, and using the term total costs with the space, such that the dependent variable is not correctly specified. Also it would not hurt to add an output; to this code to get:

data daten.mixed_costs; set daten.mixed;
	if costs=0	and cc=1 then do; costs	=1; output; end;
	run;

Good luck.

SteveDenham

Misaki60 · Posted 07-03-2021 09:39 AM

Dear SteveDenham,
thank you for your detailed reply!