BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ROLuke91
Obsidian | Level 7

Hi all,

 

I'm attempting to perform a random effects logistic regression with auto-correlated data using PROC GLIMMIX. I have a sample of 702 individuals that were measured 11 times sequentially to examine if they are yes/no (1/0) for an attribute at each time point. I'm trying to regress this dichotomous outcome on a single independent variable of time point measurement, essentially comparing the OR of individuals having the attribute vs. not having the attribute based on the specific time point (represented by a categorical/class variable with 10 distinct strata and 1 baseline reference point).

 

I would also like to incorporate the effects of random intercept and random growth trajectory slope for each individual subject. Therefore, I've chosen PROC GLIMMIX to run this analysis. My current code is as follows:

 

proc glimmix data=initglm;
class id /ref = first;
model DichoOutcome= Timepoint / dist=bin link=logit solution oddsratio;
random intercept id / subject = id type=un solution;
NLOPTIONS TECH = NRRIDG ;
run;

 

As far as I see it, this programs the logistic regression analysis to have a random intercept and a random slope for the id. Is this a correct interpretation of what I've programmed? I find it difficult to navigate PROC GLIMMIX resources and examples...

 

When I try to run this, I get an error that SAS has "insufficient memory to run the procedure." I'm guessing this means what I'm doing is incorrect?

 

Could anybody please advise as to whether my code is sufficient for me to meet my goals, and/or give guidance on how I can incorporate a random slope, if above is not correct? I can get the model to converge without the "id" in the random statement, but my understanding is that this will just give a random intercept and not random growth trajectory as well....

 

Thanks and Best,

Luke

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If you are using a linear regression of DichoOutcome (with a logit link) on Timepoint, then there is only one odds ratio, which is the odds ratio for a one-unit change in Timepoint, regardless of whether the change is from 0 to 1 or from 9 to 10, or as reported from 6 to 7. (I think 6 is used because it is the mean Timepoint value but I would not swear to that.) 

 

The most straightforward way to get the odds ratios that you want is to use Timepoint as a classification variable in the MODEL statement (an ANOVA-like model, rather than a regression) in combination with the LSMEANS statement:

 

proc glimmix data=datasetname method=laplace;
class id Timepoint;
model DichoOutcome= Timepoint / dist=bin link=logit solution;
random intercept / subject = id;
random Timepoint / subject=id type=ar(1); /* You could try omitting this statement, too */
lsmeans Timepoint / diff oddsratio;
run;

But, of course, now you are no longer regressing on Timepoint, and you'll have to decide which approach is more appropriate for your research questions and your data. For example, do your data meet the linearity assumption? If so, then regression is nicely parsimonious. If not, then a linear regression is a wrong model.

 

The text Applied Logistic Regression by Hosmer and Lemeshow has an excellent chapter on interpretation of the fitted model, distinguishing nicely between dichotomous, polychotomous, and continuous predictors. You would find it helpful, I think. Paul Allison's text is also quite good.

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

When I try to run this, I get an error that SAS has "insufficient memory to run the procedure." I'm guessing this means what I'm doing is incorrect?


It means that your computer doesn't have enough memory to run the task, most likely because you have 702 individuals and this creates a very large matrix to work with.

 

You may have to random sample the individuals to create a smaller matrix to work with. 

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Try a different RANDOM specification:

 

proc glimmix data=datasetname initglm;
class id;
model DichoOutcome= Timepoint / dist=bin link=logit solution oddsratio;
random intercept / subject = id;
random timepoint / subject=id type=ar(1);
NLOPTIONS TECH = NRRIDG ;
run;

initglm is probably not the name of your dataset. It's always good to post code that runs (to some extent), rather than one with possibl typos.

 

id is a random effects factor and so specifying a reference level is unnecessary.

 

random intercept id / subject=id is probably a main source of trouble for you. It's not a correct specification.

 

I am not sure that this combo of RANDOM statements will work, but take it for a spin and let me know.

 

ROLuke91
Obsidian | Level 7

Thanks very much, both.

 

@sld I appreciate your re-specification and the model converged and ran! Just so I am interpreting correctly: the first random statement applies a random intercept effect, representing a specified random baseline for each individual object?

 

Is the second random statement then essentially indicating that the observations of each ID - over all the time points - have  individualized random slopes? This is the goal of my programming, in any case. 

 

If the two random statements in your code satisfy the above, then this is everything I need!

 

Best,

Luke

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Oops.

 

I overlooked that Timepoint was not the the CLASS statement. My apologies, I didn't read carefully enough.

 

If you are regressing DichoOutcome on Timepoint, and you want random intercepts and random slopes among IDs with a covariance between intercepts and slopes, and you want to accommodate the potential for temporal autocorrelation among repeated measures on the same subject, then you could consider something like this:

 

 

proc glimmix data=datasetname method=laplace;
class id Timepoint;
Timepoint_continuous = Timepoint; /* makes a continuous version of Timepoint to use in regression */ model DichoOutcome= Timepoint_continuous / dist=bin link=logit solution oddsratio; random intercept Timepoint_continuous / subject = id type=un; random Timepoint / subject=id type=ar(1); /* uses the categorical version of Timepoint for autocorrelation */ NLOPTIONS TECH = NRRIDG ; run;

 

This model assumes that the relationship with Timepoint is linear on the logit scale, which you would want to check. AR(1) may or may not be the best choice; you can try others but be thoughtful about your choices. You probably should use Laplace or quadrature estimation.

 

I would definitely examine model results carefully. The model can be wrong in some way even if it converges, as evidenced by my previous code suggestion 🙂

 

 Edit: Remember than AR(1) assumes that Timepoint levels are evenly spaced.

 

 

 

 

ROLuke91
Obsidian | Level 7

Sincerely huge thank you for all of the help and guidance you've given throughout this thread! This has been very helpful for the specific analysis and for my own personal understanding of random effects regression as well :).

 

I tried your code suggestions and it converged and ran properly - and as far as I see in conjunction with your explanation, everything seems to be properly specified. 

 

I do have a question regarding obtaining odds ratios from the model: So my ultimate research goal is to have ORs that compare each individual time point to the first time point (which is a baseline reference), as well as pairwise comparisons that compare the ORs between each (non-baseline) time point. Right now, the code only generates a single OR comparing time point 7 and 6, and I'm not sure why 1.) It chose that time point to spit out an Odds Ratio for and 2.) How I can specify and generate my desired multiple comparisons. I'm trying to code in (Diff = all) in the statement, but I still get the single Odds Ratio.

 

The way I understand it is that Time_continuous is identical to the Time class variable, and therefore I shouldn't have any issues getting the pairwise comparisons if I specify with (Diff = all). Do you have any thoughts or see where I'm going wrong on this?


Best,

Luke

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If you are using a linear regression of DichoOutcome (with a logit link) on Timepoint, then there is only one odds ratio, which is the odds ratio for a one-unit change in Timepoint, regardless of whether the change is from 0 to 1 or from 9 to 10, or as reported from 6 to 7. (I think 6 is used because it is the mean Timepoint value but I would not swear to that.) 

 

The most straightforward way to get the odds ratios that you want is to use Timepoint as a classification variable in the MODEL statement (an ANOVA-like model, rather than a regression) in combination with the LSMEANS statement:

 

proc glimmix data=datasetname method=laplace;
class id Timepoint;
model DichoOutcome= Timepoint / dist=bin link=logit solution;
random intercept / subject = id;
random Timepoint / subject=id type=ar(1); /* You could try omitting this statement, too */
lsmeans Timepoint / diff oddsratio;
run;

But, of course, now you are no longer regressing on Timepoint, and you'll have to decide which approach is more appropriate for your research questions and your data. For example, do your data meet the linearity assumption? If so, then regression is nicely parsimonious. If not, then a linear regression is a wrong model.

 

The text Applied Logistic Regression by Hosmer and Lemeshow has an excellent chapter on interpretation of the fitted model, distinguishing nicely between dichotomous, polychotomous, and continuous predictors. You would find it helpful, I think. Paul Allison's text is also quite good.

 

ROLuke91
Obsidian | Level 7
Thanks again very much!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5501 views
  • 6 likes
  • 3 in conversation