Programming the statistical procedures from SAS

Indicator Variable in PROC REG

Reply
Contributor
Posts: 33

Indicator Variable in PROC REG

I create a indicator variable X1 to set 0 for group A and 1 for group B and then run proc reg on X1 along with other continuous X variables and both intercept and X1's coefficient are significant. But if I code it as 1 for group A and 0 for group B, then intercept becomes not significant. So what happens here? How to explain this?

Trusted Advisor
Posts: 1,659

Re: Indicator Variable in PROC REG

When parameterization of your model changes, the meaning of your parameters changes, and thus the statistical significance can change as well.

So in the first case, the intercept has the meaning "what is the Y when all your continuous variables are 0 at group A" and in the second case, the intercept has the meaning "what is the Y when all your continuous variables are zero at group B".

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

Use regression equations to explain my question

First case:  Y= b0 + b1 * X1 + ....

2nd case:   Y = c0 + c1 * X1 + ...

b0, b1 are significant

c0 is not significant, c1 is signficant

So which one I should use? Which one is correct?

Trusted Advisor
Posts: 1,659

Re: Indicator Variable in PROC REG

They are both correct! As I already explained. The interecepts b0 and c0 are not measuring the same thing

Your equations leave out the term that accounts for the main effect of changing from group a to group b or vice versa

So first case is really Y = b0 + b1*(group=B) + b2*x1 + b3*x2 + ...

and c0 = b0 + b1*(group=B) and b0 = c0 + c1 * (group=A) <=== c0 is not equal to b0, they are to be interpreted differently, they measure different things

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

I understand c0 not equal to b0 and two equations are equivalent agebraically. But since c0 is not significant, how can we adopt the second equation?

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

Don't you need to report significance information when presenting a regression equation?

Trusted Advisor
Posts: 1,659

Re: Indicator Variable in PROC REG

Yes, the equations are equivalent. The parts of the equation are not equivalent.

But since c0 is not significant, how can we adopt the second equation?

It's just as valid as the first equation. You continue to confuse the validity of the equation, with the meaning of individual terms.

It is up to you to understand how to interpret it properly. Perhaps instead of reporting intercepts, which is causing this confusion, you should be reporting the value, and the statistical significance, of the delta between group A and group B, which I think is simply c0-b0. That seems like a better quantity to report.

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

Sir, we are creating a regression equation for prediction, not for comparison of two groups.

Respected Advisor
Posts: 2,655

Re: Indicator Variable in PROC REG

The way you have stated your problem, you are very much comparing two groups.  I believe a clearer statement of your objectives is needed, as it is very obvious you are missing PaigeMiller's point, which seems perfectly obvious to me.  Your parameterization of the indicator variables means that the two groups will have different intercept-like terms (overall intercept plus intercept due to group).  Consequently, it is not at all surprising that the results are significant in one case, and not in the other. See PaigeMiller's response:

They are both correct! As I already explained. The interecepts b0 and c0 are not measuring the same thing

Your equations leave out the term that accounts for the main effect of changing from group a to group b or vice versa

So first case is really Y = b0 + b1*(group=B) + b2*x1 + b3*x2 + ...

and c0 = b0 + b1*(group=B) and b0 = c0 + c1 * (group=A) <=== c0 is not equal to b0, they are to be interpreted differently, they measure different things

So, you need to think along the following: Are the responses in the two groups parallel--thus the equation would differ only in the intercept for the two groups?  Or is there an interaction between group and the other predictor variables?  In this case, I would strongly recommend using one of the SAS procedures which has a CLASS option for your regression, such as GLM, MIXED, GENMOD, GLIMMIX, and not using indicator variables.

Steve Denham

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

GLM also produces the same result as REG.

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

Ok, let's focus on predictive modeling for this question.

When we create a regression model for prediction, don't all coefficients included in the model need to be significant?

Trusted Advisor
Posts: 1,659

Re: Indicator Variable in PROC REG

With regards to the Intercept(s), I would say "No". Leave them in the model, even if they are not statistically significant. (I expect others to disagree with this, but that is my position on the matter)

You might want to read "Analysis of Messy Data, Volume 1, Designed Experiments" by Milliken and Johnson. Even though yours is not a designed experiment, they talk about relevant issues in Chapter 9. In fact, they speak of the "Means Model", which is a distinctly different parameterization than the model you get through SAS. In the "Means Model", all these issues go away. There is a distinct coefficient for the intercept of Group A, and a distinct coefficient for the intercept of Group B. And then, it doesn't matter whether you set A to be 0 and B to be 1, or the other way around.

Contributor
Posts: 33

Re: Indicator Variable in PROC REG

thanks for the great answer, pagemiller.

Trusted Advisor
Posts: 1,659

Re: Indicator Variable in PROC REG

Sir, we are creating a regression equation for prediction, not for comparison of two groups.

Okay, then why the concern about the different intercepts? As you said, the models are equivalent. Either will give you the same predicted values.

As I have pointed out, and now as Steve seems to be pointing out, you can create models for prediction, or you can create models for understanding the individual terms (or both). Do NOT confuse the two. If you want a predictive model, then you choose either, and you report the Overall F as its level of significance. If you want to understand the individual terms, you report the tests of the individual model coefficients, with appropriate interpretation. (and of course you can do both)

You keep wandering back and forth between obtaining predictive model, and obtaining understanding of the individual terms.

Ask a Question
Discussion stats
  • 13 replies
  • 481 views
  • 0 likes
  • 3 in conversation