BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nerdcy
Calcite | Level 5

Hi everyone,

 

I'm running into some issue with my data. I'm using GLIMMIX to model a growth curve analysis on longitudinal data. My code is below:

 

proc glimmix data=diss method = LAPLACE;
title Total Support Conditional Growth Model;
class carnegie (Ref='0') barrons (Ref='0')flagship(Ref='0');
model totalsupport =
time time*time carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
time*carnegie time*barrons time*flagship time*stateGMC time*unemploymentGMC time*disposableGMC
time*time*carnegie time*time*barrons time*time*flagship time*time*stateGMC time*time*unemploymentGMC time*time*disposableGMC
time*carnegie*stateGMC time*barrons*stateGMC time*flagship*stateGMC time*unemploymentGMC*stateGMC time*disposableGMC*stateGMC
time*time*carnegie*stateGMC time*time*barrons*stateGMC time*time*flagship*stateGMC time*time*unemploymentGMC*stateGMC time*time*disposableGMC*stateGMC
/dist=Gamma link=log solution;
random intercept time / type=AR(1) subject = id;
covtest/Wald;
run;

 

A sample of my output looks like this:

Solutions for Fixed Effects

 Estimate StandardError DF t Value Pr > |t|
Intercept 7.7140 0.09218 373 83.68 <.0001
time 0.05337 0.06763 367 0.79 0.4305
time*time -0.00134 0.001119 3904 -1.19 0.2326
carnegie 1 -0.6266 0.06365 3904 -9.84 <.0001
carnegie 2 -0.6237 0.1218 3904 -5.12 <.0001
barrons 1 0.4344 0.07189 3904 6.04 <.0001
barrons 2 -0.2856 0.08835 3904 -3.23 0.0012
flagship 1 -0.6660 0.08558 3904 -7.78 <.0001
stateGMC 0.000048 0.000021 3904 2.33 0.0197
unemploymentGMC -0.1080 0.02829 3904 -3.82 0.0001
disposableGMC -0.00003 0 3904 -Infty <.0001
time*time*disposable -1.62E-7 0 3904 -Infty <.0001
time*stateGM*carnegi 1 1.028E-6 0 3904 Infty <.0001
time*stateGM*carnegi 2 -6.91E-8 0.000016 3904 -0.00 0.9966
time*stateGM*barrons 1 -2.99E-6 0 3904 -Infty <.0001
time*stateGM*barrons 2 -1.17E-6 0 3904 -Infty <.0001
time*stateGM*flagshi 1 4.32E-6 0 3904 Infty <.0001
time*stateGM*unemplo -1.08E-8 0 3904 -Infty <.0001
time*stateGM*disposa -565E-13 0 3904 -Infty <.0001
time*time*stat*carne 1 -1.56E-7 0 3904 -Infty <.0001
time*time*stat*carne 2 -1.06E-7 0 3904 -Infty <.0001
time*time*stat*barro 1 2.987E-7 0 3904 Infty <.0001
time*time*stat*barro 2 2.956E-7 0 3904 Infty <.0001
time*time*stat*flags 1 -2E-7 0 3904 -Infty <.0001
time*time*stat*unemp 2.46E-8 0 3904 Infty <.0001
time*time*stat*dispo 4.23E-13 . . . .

 

Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense. 

 

I checked the multicollinearity of these variables (also checked the correlation matrix) but they look fine.

 

Variable DF Parameter Estimate Standard Error t Value Pr > |t| Tolerance Variance Inflation
Intercept 1 3965.76565 85.20213 46.55 <.0001 . 0
time 1 57.24893 5.92625 9.66 <.0001 0.85547 1.16895
carnegie 1 -661.36628 39.82289 -16.61 <.0001 0.85578 1.16853
barrons 1 124.55068 30.89715 4.03 <.0001 0.96042 1.04121
flagship 1 -2832.86619 78.96505 -35.87 <.0001 0.80615 1.24047
stateGMC 1 0.10810 0.00542 19.93 <.0001 0.89349 1.11921
unemploymentGMC 1 -45.76003 12.24164 -3.74 0.0002 0.91471 1.09324
disposableGMC 1 -0.05768 0.00469 -12.30 <.0001 0.70785 1.41272
gdpGMC 1 3.61928E-10 4.01861E-11 9.01 <.0001 0.77164 1.29594

 

Can anyone point me to some other issues it could be?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

In addition to all that @sld and @PaigeMiller  offered, I will suggest some other ideas to consider - your subject is id.  Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement.  If you added this:

 

EFFECT poly = polynomial(time/degree=2);

and changed the MODEL statement to:

 

model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability.  For instance, suppose the time*time*carnegie*stateGMC is significant.  What does that mean?  Oh, and how many levels do carnegie, barrons and flagship have?  If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'.  If that is the case your model statement would become:

 

model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

Just some thoughts.

 

SteveDenham

 

View solution in original post

21 REPLIES 21
PaigeMiller
Diamond | Level 26
  1. Some of these variables with 0 standard error are (within roundoff error) completely correlated with (linear combinations of) other variables;
  2. Or there is no variability in the response variable after accounting for the effects of other variables; the terms in the model with 0 variability have no explanatory power; all of the variability is explained by the terms with the standard error > 0.

Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense.

My general rule here is that when SAS says one thing and the user says it is wrong, I believe SAS.

 

PLEASE FROM NOW ON

  • Format your text properly.
  • Code should be pasted into the window that appears when you click on the running man icon. DO NOT SKIP THIS STEP
  • Output should be included into your message as a screen capture by clicking on the "Insert Photos" icon. DO NOT SKIP THIS STEP
  • Output as text (which you have in your message above) should be pasted into the window that appears when you click on the </> icon. DO NOT SKIP THIS STEP
  • All of this makes your message much more readable and more people will contribute to the solution

PLEASE DO NOT SKIP THESE STEPS in the future.

 

--
Paige Miller
Nerdcy
Calcite | Level 5

Thank you for your response and informing me how to format my output. 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

A potpourri of ideas:

 

You don't say which variables you checked for multicollinearity, and it's not clear that you centered all continuous covariates (i.e., time and unemploymentGMC). Quadratic terms can easily be collinear, e.g., time and time*time. So I would try centering all continuous covariates.

 

In fact, I would rescale predictor variables; as you try to sort this out, use standardized continuous covariates (not just centered).

 

I'd start small and build up. No one wants to try to interpret 4-way interactions anyway 🙂

 

You have random linear slopes with time, but not time*time, or any other covariate. I'd start with just random intercept.

 

Are your covariates measured at the id-level or at the id-time-level? Does your model reflect the appropriate design structure?

 

Your Parameter Estimate table reports gdpGMC, but it's not in your model statement. It's difficult for the Community to sort out problems when the evidence is inconsistent and incomplete.

 

I hope this helps move you forward.

 

PaigeMiller
Diamond | Level 26

You don't say which variables you checked for multicollinearity

It's not enough to check variables for multicollinearity. You also have to check linear combinations of variables with other linear combination of variables.

 

I also suspect centering might be a problem. I recall a case a long time ago where the centering was not exact and the slight roundoff error caused by the centering produced similar results.

--
Paige Miller
Nerdcy
Calcite | Level 5

Thank you. I will try centering time which is the only variable I haven't centered.

Nerdcy
Calcite | Level 5

Hi, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?

Nerdcy
Calcite | Level 5

Thank you. I will try centering time. 

SteveDenham
Jade | Level 19

In addition to all that @sld and @PaigeMiller  offered, I will suggest some other ideas to consider - your subject is id.  Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement.  If you added this:

 

EFFECT poly = polynomial(time/degree=2);

and changed the MODEL statement to:

 

model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability.  For instance, suppose the time*time*carnegie*stateGMC is significant.  What does that mean?  Oh, and how many levels do carnegie, barrons and flagship have?  If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'.  If that is the case your model statement would become:

 

model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

Just some thoughts.

 

SteveDenham

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

@SteveDenham   Oh, good spotting about ID not being the CLASS statement! I like the EFFECT suggestion as well.

 

As Steve notes, ID can be omitted from the CLASS statement as long as the dataset is sorted by ID. Or I'm guessing that there are about 400 subjects, so ID could go into CLASS; you'd probably want to turn off the classification table in the output. I think this would work:

 

ods exclude classlevels;

 

The DO Loop: What is the best way to suppress ODS output in SAS? 

The GLIMMIX Procedure ODS Table Names 

 

SteveDenham
Jade | Level 19

In addition to @sld 's method for suppressing huge class level lists, there is a NOCLPRINT option for the PROC GLIMMIX statement.

 

SteveDenham

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Even easier!

Nerdcy
Calcite | Level 5

Thank you! I will use effect poly. 

Nerdcy
Calcite | Level 5

Hi, as an update, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Nonlinear relationships are certainly a possibility. At a minimum, when you have a model that runs, you would do the usual regression diagnostics and residual analyses.

 

It could also be that the log transformation rescaled the variables, reducing their variances or making their variances more similar. Have you tried standardizing?

 

Did you address the issue with ID and either sorting or including it in the CLASS statement?

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 21 replies
  • 1995 views
  • 15 likes
  • 4 in conversation