Hi everyone,
I'm running into some issue with my data. I'm using GLIMMIX to model a growth curve analysis on longitudinal data. My code is below:
proc glimmix data=diss method = LAPLACE;
title Total Support Conditional Growth Model;
class carnegie (Ref='0') barrons (Ref='0')flagship(Ref='0');
model totalsupport =
time time*time carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
time*carnegie time*barrons time*flagship time*stateGMC time*unemploymentGMC time*disposableGMC
time*time*carnegie time*time*barrons time*time*flagship time*time*stateGMC time*time*unemploymentGMC time*time*disposableGMC
time*carnegie*stateGMC time*barrons*stateGMC time*flagship*stateGMC time*unemploymentGMC*stateGMC time*disposableGMC*stateGMC
time*time*carnegie*stateGMC time*time*barrons*stateGMC time*time*flagship*stateGMC time*time*unemploymentGMC*stateGMC time*time*disposableGMC*stateGMC
/dist=Gamma link=log solution;
random intercept time / type=AR(1) subject = id;
covtest/Wald;
run;
A sample of my output looks like this:
Solutions for Fixed Effects
Estimate StandardError DF t Value Pr > |t|
Intercept 7.7140 0.09218 373 83.68 <.0001
time 0.05337 0.06763 367 0.79 0.4305
time*time -0.00134 0.001119 3904 -1.19 0.2326
carnegie 1 -0.6266 0.06365 3904 -9.84 <.0001
carnegie 2 -0.6237 0.1218 3904 -5.12 <.0001
barrons 1 0.4344 0.07189 3904 6.04 <.0001
barrons 2 -0.2856 0.08835 3904 -3.23 0.0012
flagship 1 -0.6660 0.08558 3904 -7.78 <.0001
stateGMC 0.000048 0.000021 3904 2.33 0.0197
unemploymentGMC -0.1080 0.02829 3904 -3.82 0.0001
disposableGMC -0.00003 0 3904 -Infty <.0001
time*time*disposable -1.62E-7 0 3904 -Infty <.0001
time*stateGM*carnegi 1 1.028E-6 0 3904 Infty <.0001
time*stateGM*carnegi 2 -6.91E-8 0.000016 3904 -0.00 0.9966
time*stateGM*barrons 1 -2.99E-6 0 3904 -Infty <.0001
time*stateGM*barrons 2 -1.17E-6 0 3904 -Infty <.0001
time*stateGM*flagshi 1 4.32E-6 0 3904 Infty <.0001
time*stateGM*unemplo -1.08E-8 0 3904 -Infty <.0001
time*stateGM*disposa -565E-13 0 3904 -Infty <.0001
time*time*stat*carne 1 -1.56E-7 0 3904 -Infty <.0001
time*time*stat*carne 2 -1.06E-7 0 3904 -Infty <.0001
time*time*stat*barro 1 2.987E-7 0 3904 Infty <.0001
time*time*stat*barro 2 2.956E-7 0 3904 Infty <.0001
time*time*stat*flags 1 -2E-7 0 3904 -Infty <.0001
time*time*stat*unemp 2.46E-8 0 3904 Infty <.0001
time*time*stat*dispo 4.23E-13 . . . .
Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense.
I checked the multicollinearity of these variables (also checked the correlation matrix) but they look fine.
Variable DF Parameter Estimate Standard Error t Value Pr > |t| Tolerance Variance Inflation
Intercept 1 3965.76565 85.20213 46.55 <.0001 . 0
time 1 57.24893 5.92625 9.66 <.0001 0.85547 1.16895
carnegie 1 -661.36628 39.82289 -16.61 <.0001 0.85578 1.16853
barrons 1 124.55068 30.89715 4.03 <.0001 0.96042 1.04121
flagship 1 -2832.86619 78.96505 -35.87 <.0001 0.80615 1.24047
stateGMC 1 0.10810 0.00542 19.93 <.0001 0.89349 1.11921
unemploymentGMC 1 -45.76003 12.24164 -3.74 0.0002 0.91471 1.09324
disposableGMC 1 -0.05768 0.00469 -12.30 <.0001 0.70785 1.41272
gdpGMC 1 3.61928E-10 4.01861E-11 9.01 <.0001 0.77164 1.29594
Can anyone point me to some other issues it could be?
In addition to all that @sld and @PaigeMiller offered, I will suggest some other ideas to consider - your subject is id. Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement. If you added this:
EFFECT poly = polynomial(time/degree=2);
and changed the MODEL statement to:
model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;
you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability. For instance, suppose the time*time*carnegie*stateGMC is significant. What does that mean? Oh, and how many levels do carnegie, barrons and flagship have? If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'. If that is the case your model statement would become:
model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;
Just some thoughts.
SteveDenham
Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense.
My general rule here is that when SAS says one thing and the user says it is wrong, I believe SAS.
PLEASE FROM NOW ON
PLEASE DO NOT SKIP THESE STEPS in the future.
Thank you for your response and informing me how to format my output.
A potpourri of ideas:
You don't say which variables you checked for multicollinearity, and it's not clear that you centered all continuous covariates (i.e., time and unemploymentGMC). Quadratic terms can easily be collinear, e.g., time and time*time. So I would try centering all continuous covariates.
In fact, I would rescale predictor variables; as you try to sort this out, use standardized continuous covariates (not just centered).
I'd start small and build up. No one wants to try to interpret 4-way interactions anyway 🙂
You have random linear slopes with time, but not time*time, or any other covariate. I'd start with just random intercept.
Are your covariates measured at the id-level or at the id-time-level? Does your model reflect the appropriate design structure?
Your Parameter Estimate table reports gdpGMC, but it's not in your model statement. It's difficult for the Community to sort out problems when the evidence is inconsistent and incomplete.
I hope this helps move you forward.
You don't say which variables you checked for multicollinearity
It's not enough to check variables for multicollinearity. You also have to check linear combinations of variables with other linear combination of variables.
I also suspect centering might be a problem. I recall a case a long time ago where the centering was not exact and the slight roundoff error caused by the centering produced similar results.
Thank you. I will try centering time which is the only variable I haven't centered.
Hi, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?
Thank you. I will try centering time.
In addition to all that @sld and @PaigeMiller offered, I will suggest some other ideas to consider - your subject is id. Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement. If you added this:
EFFECT poly = polynomial(time/degree=2);
and changed the MODEL statement to:
model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;
you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability. For instance, suppose the time*time*carnegie*stateGMC is significant. What does that mean? Oh, and how many levels do carnegie, barrons and flagship have? If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'. If that is the case your model statement would become:
model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;
Just some thoughts.
SteveDenham
@SteveDenham Oh, good spotting about ID not being the CLASS statement! I like the EFFECT suggestion as well.
As Steve notes, ID can be omitted from the CLASS statement as long as the dataset is sorted by ID. Or I'm guessing that there are about 400 subjects, so ID could go into CLASS; you'd probably want to turn off the classification table in the output. I think this would work:
ods exclude classlevels;
The DO Loop: What is the best way to suppress ODS output in SAS?
The GLIMMIX Procedure ODS Table Names
In addition to @sld 's method for suppressing huge class level lists, there is a NOCLPRINT option for the PROC GLIMMIX statement.
SteveDenham
Even easier!
Thank you! I will use effect poly.
Hi, as an update, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?
Nonlinear relationships are certainly a possibility. At a minimum, when you have a model that runs, you would do the usual regression diagnostics and residual analyses.
It could also be that the log transformation rescaled the variables, reducing their variances or making their variances more similar. Have you tried standardizing?
Did you address the issue with ID and either sorting or including it in the CLASS statement?
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.