Solved: Re: GLIMMIX 0 Standard Error Infinity t value

Nerdcy · Posted 07-04-2020 06:29 PM

Hi everyone,

I'm running into some issue with my data. I'm using GLIMMIX to model a growth curve analysis on longitudinal data. My code is below:

proc glimmix data=diss method = LAPLACE;
title Total Support Conditional Growth Model;
class carnegie (Ref='0') barrons (Ref='0')flagship(Ref='0');
model totalsupport =
time time*time carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
time*carnegie time*barrons time*flagship time*stateGMC time*unemploymentGMC time*disposableGMC
time*time*carnegie time*time*barrons time*time*flagship time*time*stateGMC time*time*unemploymentGMC time*time*disposableGMC
time*carnegie*stateGMC time*barrons*stateGMC time*flagship*stateGMC time*unemploymentGMC*stateGMC time*disposableGMC*stateGMC
time*time*carnegie*stateGMC time*time*barrons*stateGMC time*time*flagship*stateGMC time*time*unemploymentGMC*stateGMC time*time*disposableGMC*stateGMC
/dist=Gamma link=log solution;
random intercept time / type=AR(1) subject = id;
covtest/Wald;
run;

A sample of my output looks like this:

Solutions for Fixed Effects

Estimate StandardError DF t Value Pr > |t|
Intercept 7.7140 0.09218 373 83.68 <.0001
time 0.05337 0.06763 367 0.79 0.4305
time*time -0.00134 0.001119 3904 -1.19 0.2326
carnegie 1 -0.6266 0.06365 3904 -9.84 <.0001
carnegie 2 -0.6237 0.1218 3904 -5.12 <.0001
barrons 1 0.4344 0.07189 3904 6.04 <.0001
barrons 2 -0.2856 0.08835 3904 -3.23 0.0012
flagship 1 -0.6660 0.08558 3904 -7.78 <.0001
stateGMC 0.000048 0.000021 3904 2.33 0.0197
unemploymentGMC -0.1080 0.02829 3904 -3.82 0.0001
disposableGMC -0.00003 0 3904 -Infty <.0001
time*time*disposable -1.62E-7 0 3904 -Infty <.0001
time*stateGM*carnegi 1 1.028E-6 0 3904 Infty <.0001
time*stateGM*carnegi 2 -6.91E-8 0.000016 3904 -0.00 0.9966
time*stateGM*barrons 1 -2.99E-6 0 3904 -Infty <.0001
time*stateGM*barrons 2 -1.17E-6 0 3904 -Infty <.0001
time*stateGM*flagshi 1 4.32E-6 0 3904 Infty <.0001
time*stateGM*unemplo -1.08E-8 0 3904 -Infty <.0001
time*stateGM*disposa -565E-13 0 3904 -Infty <.0001
time*time*stat*carne 1 -1.56E-7 0 3904 -Infty <.0001
time*time*stat*carne 2 -1.06E-7 0 3904 -Infty <.0001
time*time*stat*barro 1 2.987E-7 0 3904 Infty <.0001
time*time*stat*barro 2 2.956E-7 0 3904 Infty <.0001
time*time*stat*flags 1 -2E-7 0 3904 -Infty <.0001
time*time*stat*unemp 2.46E-8 0 3904 Infty <.0001
time*time*stat*dispo 4.23E-13 . . . .

Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense.

I checked the multicollinearity of these variables (also checked the correlation matrix) but they look fine.

Variable DF Parameter Estimate Standard Error t Value Pr > |t| Tolerance Variance Inflation
Intercept 1 3965.76565 85.20213 46.55 <.0001 . 0
time 1 57.24893 5.92625 9.66 <.0001 0.85547 1.16895
carnegie 1 -661.36628 39.82289 -16.61 <.0001 0.85578 1.16853
barrons 1 124.55068 30.89715 4.03 <.0001 0.96042 1.04121
flagship 1 -2832.86619 78.96505 -35.87 <.0001 0.80615 1.24047
stateGMC 1 0.10810 0.00542 19.93 <.0001 0.89349 1.11921
unemploymentGMC 1 -45.76003 12.24164 -3.74 0.0002 0.91471 1.09324
disposableGMC 1 -0.05768 0.00469 -12.30 <.0001 0.70785 1.41272
gdpGMC 1 3.61928E-10 4.01861E-11 9.01 <.0001 0.77164 1.29594

Can anyone point me to some other issues it could be?

SteveDenham · Posted 07-06-2020 01:24 PM

In addition to all that @sld and @PaigeMiller offered, I will suggest some other ideas to consider - your subject is id. Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement. If you added this:

EFFECT poly = polynomial(time/degree=2);

and changed the MODEL statement to:

model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability. For instance, suppose the time*time*carnegie*stateGMC is significant. What does that mean? Oh, and how many levels do carnegie, barrons and flagship have? If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'. If that is the case your model statement would become:

model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

Just some thoughts.

SteveDenham

View solution in original post

PaigeMiller · Posted 07-04-2020 06:58 PM

Some of these variables with 0 standard error are (within roundoff error) completely correlated with (linear combinations of) other variables;
Or there is no variability in the response variable after accounting for the effects of other variables; the terms in the model with 0 variability have no explanatory power; all of the variability is explained by the terms with the standard error > 0.

Basically, any fixed effects with either stateGMC (grand mean-centered continuous variable) or disposableGMC (grand mean-centered continuous variable) gives me a standard error of 0 or . and Infinity for the t value. Obviously this is wrong and doesn't make sense.

My general rule here is that when SAS says one thing and the user says it is wrong, I believe SAS.

PLEASE FROM NOW ON

Format your text properly.
Code should be pasted into the window that appears when you click on the running man icon. DO NOT SKIP THIS STEP
Output should be included into your message as a screen capture by clicking on the "Insert Photos" icon. DO NOT SKIP THIS STEP
Output as text (which you have in your message above) should be pasted into the window that appears when you click on the </> icon. DO NOT SKIP THIS STEP
All of this makes your message much more readable and more people will contribute to the solution

PLEASE DO NOT SKIP THESE STEPS in the future.

--
Paige Miller

Nerdcy · Posted 07-06-2020 10:33 AM

Thank you for your response and informing me how to format my output.

sld · Posted 07-05-2020 07:51 PM

A potpourri of ideas:

You don't say which variables you checked for multicollinearity, and it's not clear that you centered all continuous covariates (i.e., time and unemploymentGMC). Quadratic terms can easily be collinear, e.g., time and time*time. So I would try centering all continuous covariates.

In fact, I would rescale predictor variables; as you try to sort this out, use standardized continuous covariates (not just centered).

I'd start small and build up. No one wants to try to interpret 4-way interactions anyway 🙂

You have random linear slopes with time, but not time*time, or any other covariate. I'd start with just random intercept.

Are your covariates measured at the id-level or at the id-time-level? Does your model reflect the appropriate design structure?

Your Parameter Estimate table reports gdpGMC, but it's not in your model statement. It's difficult for the Community to sort out problems when the evidence is inconsistent and incomplete.

I hope this helps move you forward.

PaigeMiller · Posted 07-06-2020 06:27 AM

You don't say which variables you checked for multicollinearity

It's not enough to check variables for multicollinearity. You also have to check linear combinations of variables with other linear combination of variables.

I also suspect centering might be a problem. I recall a case a long time ago where the centering was not exact and the slight roundoff error caused by the centering produced similar results.

--
Paige Miller

Nerdcy · Posted 07-06-2020 10:35 AM

Thank you. I will try centering time which is the only variable I haven't centered.

Nerdcy · Posted 07-07-2020 04:56 PM

Hi, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?

Nerdcy · Posted 07-06-2020 10:34 AM

Thank you. I will try centering time.

SteveDenham · Posted 07-06-2020 01:24 PM

In addition to all that @sld and @PaigeMiller offered, I will suggest some other ideas to consider - your subject is id. Be sure this is numeric for all subjects, and it wouldn't hurt to sort the dataset by id, as it is not in the CLASS statement. Next, your model cries out for the use of the EFFECT statement. If you added this:

EFFECT poly = polynomial(time/degree=2);

and changed the MODEL statement to:

model totalsupport =
poly carnegie barrons flagship stateGMC unemploymentGMC disposableGMC
poly*carnegie poly*barrons poly*flagship poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*carnegie*stateGMC poly*barrons*stateGMC poly*flagship*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

you would automatically center and scale time and time*time. I would really consider dropping the last line of effects in the MODEL statement (although that is a preference from biology, not economics). Multi-dimensional response surfaces are often indistinguishable from noise, and suffer from interpretability. For instance, suppose the time*time*carnegie*stateGMC is significant. What does that mean? Oh, and how many levels do carnegie, barrons and flagship have? If each is a binary, consider creating a catch-all variable (call it source for now), such that source has 3 levels -'Carnegie', 'Barrons' and 'Flagship'. If that is the case your model statement would become:

model totalsupport =
poly source stateGMC unemploymentGMC disposableGMC
poly*source poly*stateGMC poly*unemploymentGMC poly*disposableGMC
poly*source*stateGMC poly*unemploymentGMC*stateGMC poly*disposableGMC*stateGMC
/dist=Gamma link=log solution;

Just some thoughts.

SteveDenham

sld · Posted 07-06-2020 02:06 PM

@SteveDenham Oh, good spotting about ID not being the CLASS statement! I like the EFFECT suggestion as well.

As Steve notes, ID can be omitted from the CLASS statement as long as the dataset is sorted by ID. Or I'm guessing that there are about 400 subjects, so ID could go into CLASS; you'd probably want to turn off the classification table in the output. I think this would work:

ods exclude classlevels;

The DO Loop: What is the best way to suppress ODS output in SAS?

The GLIMMIX Procedure ODS Table Names

SteveDenham · Posted 07-06-2020 02:15 PM

In addition to @sld 's method for suppressing huge class level lists, there is a NOCLPRINT option for the PROC GLIMMIX statement.

SteveDenham

sld · Posted 07-06-2020 02:17 PM

Even easier!

Nerdcy · Posted 07-07-2020 04:53 PM

Thank you! I will use effect poly.

Nerdcy · Posted 07-07-2020 04:57 PM

Hi, as an update, after centering time it still didn't work, but when I used the log of the predictors it worked fine. Does this mean it was not a linear relationship (logged outcome and logged predictor)?

sld · Posted 07-07-2020 05:06 PM

Nonlinear relationships are certainly a possibility. At a minimum, when you have a model that runs, you would do the usual regression diagnostics and residual analyses.

It could also be that the log transformation rescaled the variables, reducing their variances or making their variances more similar. Have you tried standardizing?

Did you address the issue with ID and either sorting or including it in the CLASS statement?

Ready to join fellow brilliant minds for the SAS Hackathon?