BookmarkSubscribeRSS Feed
HeatherA
Calcite | Level 5

I am new to utilizing SAS 9.4 full time for analyses. I am working on my dissertation data and I know that the best way to analyze my data is a generalized linear mixed model. No one else in my lab utilizes SAS, they prefer SPSS, which does not to as good a job with very complex stats in my opinion.

 

My data is a secondary analysis of a year long dataset on women who were undergoing an intervention to regain menstrual function. My Y (resume) is did they or didn't they resume menses during the study.

 

I am interested in whether body composition (of which I have 12 variables) changed the odds of menstrual recovery and am not interested in the intervention effect (rnd) for this analysis. My main concern is fitting the model so that the subject-specific random intercept is assumed and uneven repeate structure of the data (and number of measures per subject) is considered in the analysis.

 

I am not sure the solution I have come up with following extended reviews of the GLIMMIX literature reflects the analysis I am interested in.

 

Proc Glimmix data=resumption;

Class period id rnd;

Model resume = wt bmi perbf fm fmi lmi lpct tpct tlpctr apct gpct agpctr/ link=logit dist=binomial or solution;

Random int/subject=id type=ar residual;

Run;

 

My data is set up like: 

Options nocenter pageno=1;

Data resumption;

Input period $ id rnd wt bmi perbf fm fmi lmi lpct tpct tlpctr apct gpct agpctr resume;

Cards;

SBI11345.9016.0712.305.992.1014.3712.589.810.786.409.570.670
I51350.9017.8214.907.742.7115.8116.2012.570.7810.3721.340.490
SBI114252.1619.8817.609.293.5416.6620.7414.420.6913.0125.170.520
I514253.9020.5418.609.973.8016.7222.3814.700.6613.9627.560.510
I914253.0020.2017.709.483.6116.9321.7413.780.6312.7325.930.490
SBI1126250.2317.8724.3012.224.3512.7131.1019.190.62...0
SBI1132356.4520.0923.7613.254.7114.2129.0320.950.72...0
I5132356.4120.07..........0
I9132356.5020.2126.2514.745.2413.8830.4024.860.82...0
SBI1155240.9516.8110.104.051.6614.0412.506.900.555.0017.000.290
I5155240.4516.6012.204.892.0113.6714.709.400.647.2020.000.360
I9155241.2016.9112.104.902.0113.8215.209.000.597.3020.800.350
I21155240.2016.5011.004.411.8113.8313.408.200.615.4017.600.310
I33155239.0016.017.502.921.2013.988.604.800.563.9011.000.350
I49155242.1017.289.804.121.6914.8111.407.400.655.8016.400.350
SBI1170256.2023.7732.4018.027.6214.9430.7035.401.1545.1040.201.120
SBI1175354.3520.3319.6010.653.9815.5024.0016.800.7018.2031.300.580
I5175355.8520.8920.1011.434.2816.2124.4017.400.7118.6031.300.590
I9175356.6021.1720.4011.554.3216.0624.7017.700.7219.7032.600.600
I21175356.8521.2721.4012.114.5315.8625.9019.000.7321.8033.100.661
SBI1217351.3018.6423.0011.664.2413.4528.9018.100.6318.2034.900.520
I5217352.7019.1525.2013.164.7813.4830.1021.400.7122.6037.700.600
I9217354.2519.7126.8014.175.1513.3131.0023.400.7526.7039.600.670
I21217354.5519.8229.1016.345.9413.7635.1025.100.7228.4044.000.650
I33217354.5519.8228.4015.575.6613.5534.1024.800.7327.0043.100.630
I49217355.2020.0627.3014.705.3413.4632.8023.400.7124.6041.400.591

 

The time periods I have body composition variables are not evenly distributed (screening, and intervention weeks 5, 9, 21, 33, and 49). Duration of time to resumption is not of interest in my analysis. Participants that resumed menstrual function did so at various time points, and many did not resume. Not all of the participants made it through the study to intervention week 49.

 

Not all women have the apct, gpct, and agpctr variables due to a change in the machine used to evaluate body composition. The apct and gpct have shown to be significantly different at the time of resumption compared to non resumers when analyzed with a Hotelling's T2 test, therefore I want to keep these variables in the analysis.

 

I understand that most of my variables are correlated; however, only 7 of the variables are correlated above a rho = 0.95 and none are at 0.99.

 

Thanks for any support!

5 REPLIES 5
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The dataset included in your code has 8 subjects. Is this the full dataset or just a subset? If it's a subset, how many subjects do you have?

 

HeatherA
Calcite | Level 5

I have a total of 30 participants in the analysis I am completing (an approximatly even split of those who resumed and those that did not). I chose those specific participants to show the variability in when participants withdrew or resumed menses in the study. The information provides (though a representation of the actual data) shows the randomness of the data that is missing for the various variables within any one participant.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I'll offer some thoughts. Hopefully other people will weigh in as well.

 

 

1. Consider a "time to event" analysis, where event is resumption of menses. Given the extent of censored observations (i.e., women who withdrew, and women who did not resume menses before the end of the study), I doubt that a GLMM with a binary response will work well. The text by Singer and Willett (Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence) and the code here http://www.ats.ucla.edu/stat/r/examples/alda/ could be a good place to start.

 

2. 30 subjects, about half of whom exhibited the event, is a small sample. Keep in mind your effective sample size and the risk of over-fitting.

 

3. With correlations that high among the body condition covariates, you will surely have issues with (multi)collinearity. Consider a more complete assessment of multicollinearity, and then some form of dimension reduction, e.g., dropping predictor variables, forming an index, principal components analysis, factor analysis. Again, keep in mind your effective sample size and the risk of over-fitting.

 

4. You say you are not interested in the "intervention" effect. Are different women exposed to different interventions in your dataset? If so, I would not think that you could ignore it unless intervention absolutely does not matter.

 

5. The missing covariate values will pose problems. Consider imputation? Or drop those covariates with missing values if they are highly correlated (and thus redundant) to other covariates.

 

6. These thoughts are all independent of the software that you use, whether SPSS or SAS (or anything else).

 

7. I suspect this analysis will be a challenge, quite probably more than can be dealt with adequately in this forum. Consider finding someone at your institution with statistical expertise to guide you.

 

HeatherA
Calcite | Level 5

Thanks for the input.

1. We are working with trying to understand whether or not time to event matters. In the current analysis we want to proceed under the assumption that time to event doesn't matter, what matters is the actual body composition no matter how long it took the participant to reach that value. In another analysis we are going to evaluate the time to event aspect of the study.

 

2. Yes, we are considering overfitting. We are fitting the GLMM based on the results of Correlation analyses with resumption as well as Hotelling T2 analyses. This way I have dropped to 7 predictors.

 

3. I am limited in my knowledge of ways to evaluate mulitcollinearity. I know to check correlations, however the value of rho to be concerned at varies. I know I can also look at VIF values as well. Do you have suggestions on the best/most appropriate way to evaluate multicollinearlity.

 

4. The women went through a freeliving refeeding intervention, however women in both arms resumed. My focus is the body composition changes, which independent of the intervention in some respects. This is a secondary analysis, I can add the randomized group in as a predictor but then I again risk over fitting.

 

5. The one predictor of interest is missing in a subset of the sample due to a change in the machine being used. This data cannot be imputed.

 

6/7. I am working with an individual at my institution on the best analysis to complete, however he does not uses SAS and suggested using the SAS forum for guidance on the structure of the SAS coding. As a student I am trying to explore the statistics independently as well as with his guidance.

 

Thanks again

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Thank you for the informative responses. My turn!

 

1. Because of the censoring and because the body condition covariates are time-varying, I'm not seeing how a good binary GLMM can be constructed. The resume values don't switch back and forth across periods between N and Y depending on body conditions; a resume value is N until it (possibly) becomes Y. Looks like event data to me.

 

2 and 3. Your approach to variable selection is arguably treacherous. But I concur that you have too many predictor variables for your sample size. Even with this process of down-sizing to 7 variables, you still need to assess multicollinearity among them. A quick Google of "multicollinearity detection" generates a lot of useful links. Multicollinearity is a property of the predictor variables, so you can read about multicollinearity in regression texts that deal with normal-distribution response and use multicollinearity statistics generated by regression software (like REG).

 

4. If body conditions are largely a consequence of intervention, then it would be appropriate to drop intervention. 

 

5. By default, if an observation contains a missing value for one or more predictor variables, the software will drop that observation in the model fitting; this is true for most (if not all) software packages. So, you need to deal with those missing values in some fashion, or you'll lose those observations.

 

6 and 7. I commend you for not blinding using stat software! This forum is usually good to help with code when you provide enough detail to identify an appropriate model for a corresponding dataset. In this case, as you can tell, I don't believe a binary GLMM is appropriate so I cannot help with syntax for that model (or should not, but see below). If you switch to a time-to-event model, this website which I linked to earlier http://www.ats.ucla.edu/stat/r/examples/alda/ has SAS code examples, and the forum could help with code if you hit snags.

 

In the interest of your ongoing education, I will add that if I liked your approach, which I don't, I would start with

 

 

proc glimmix data=resumption method=laplace;
  class period id;
  model resume = wt bmi perbf fm fmi lmi lpct tpct tlpctr apct gpct agpctr period /   
    link=logit dist=binomial or solution;
  random int / subject=id;
  random period / subject=id type=ar(1);
run;

This sort of model almost always requires some twiddling, even beyond that needed to identify a good covariance structure type. Including "type=ar(1)" and "residual" on the first random statement is wrong. Including both random statements as here identifies an AR(1)+RE covariance structure. Omitting the first random statement and retaining only the second identifies an AR(1) covariance structure. See Littell et al. http://onlinelibrary.wiley.com/doi/10.1002/1097-0258(20000715)19:13%3C1793::AID-SIM482%3E3.0.CO;2-Q/...  or Stroup (2013) Generalized Linear Mixed Models for details about the distinction between AR(1)+RE and AR(1). If you had enough subjects and enough periods and more complete longitudinal data, you could consider a model with random slopes, specified as

 

 

 

random wt bmi perbf fm fmi lmi lpct tpct tlpctr apct gpct agpctr / subject=id type=vc;

but this is way too complex for what you have to work with.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1552 views
  • 1 like
  • 2 in conversation