turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Indicator Variable in PROC REG

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 02:53 AM

I create a indicator variable X1 to set 0 for group A and 1 for group B and then run proc reg on X1 along with other continuous X variables and both intercept and X1's coefficient are significant. But if I code it as 1 for group A and 0 for group B, then intercept becomes not significant. So what happens here? How to explain this?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 09:26 AM

When parameterization of your model changes, the meaning of your parameters changes, and thus the statistical significance can change as well.

So in the first case, the intercept has the meaning "what is the Y when all your continuous variables are 0 at group A" and in the second case, the intercept has the meaning "what is the Y when all your continuous variables are zero at group B".

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 12:32 PM

Use regression equations to explain my question

First case: Y= b0 + b1 * X1 + ....

2nd case: Y = c0 + c1 * X1 + ...

b0, b1 are significant

c0 is not significant, c1 is signficant

So which one I should use? Which one is correct?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 12:41 PM

They are both correct! As I already explained. The interecepts b0 and c0 are not measuring the same thing

Your equations leave out the term that accounts for the main effect of changing from group a to group b or vice versa

So first case is really Y = b0 + b1*(group=B) + b2*x1 + b3*x2 + ...

and c0 = b0 + b1*(group=B) and b0 = c0 + c1 * (group=A) <=== c0 is not equal to b0, they are to be interpreted differently, they measure different things

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 12:48 PM

I understand c0 not equal to b0 and two equations are equivalent agebraically. But since c0 is not significant, how can we adopt the second equation?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:03 PM

Don't you need to report significance information when presenting a regression equation?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:09 PM

Yes, the equations are equivalent. The parts of the equation are not equivalent.

But since c0 is not significant, how can we adopt the second equation?

It's just as valid as the first equation. You continue to confuse the validity of the equation, with the meaning of individual terms.

It is up to you to understand how to interpret it properly. Perhaps instead of reporting intercepts, which is causing this confusion, you should be reporting the value, and the statistical significance, of the delta between group A and group B, which I think is simply c0-b0. That seems like a better quantity to report.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:12 PM

Sir, we are creating a regression equation for prediction, not for comparison of two groups.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:39 PM

The way you have stated your problem, you are very much comparing two groups. I believe a clearer statement of your objectives is needed, as it is very obvious you are missing PaigeMiller's point, which seems perfectly obvious to me. Your parameterization of the indicator variables means that the two groups will have different intercept-like terms (overall intercept plus intercept due to group). Consequently, it is not at all surprising that the results are significant in one case, and not in the other. See PaigeMiller's response:

**They are both correct! As I already explained. The interecepts b0 and c0 are not measuring the same thing**

** **

**Your equations leave out the term that accounts for the main effect of changing from group a to group b or vice versa**

** **

**So first case is really Y = b0 + b1*(group=B) + b2*x1 + b3*x2 + ...**

** **

**and c0 = b0 + b1*(group=B) and b0 = c0 + c1 * (group=A) <=== c0 is not equal to b0, they are to be interpreted differently, they measure different things**

So, you need to think along the following: Are the responses in the two groups parallel--thus the equation would differ only in the intercept for the two groups? Or is there an interaction between group and the other predictor variables? In this case, I would strongly recommend using one of the SAS procedures which has a CLASS option for your regression, such as GLM, MIXED, GENMOD, GLIMMIX, and not using indicator variables.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:45 PM

GLM also produces the same result as REG.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 02:01 PM

Ok, let's focus on predictive modeling for this question.

When we create a regression model for prediction, don't all coefficients included in the model need to be significant?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 02:27 PM

With regards to the Intercept(s), I would say "No". Leave them in the model, even if they are not statistically significant. (I expect others to disagree with this, but that is my position on the matter)

You might want to read "Analysis of Messy Data, Volume 1, Designed Experiments" by Milliken and Johnson. Even though yours is not a designed experiment, they talk about relevant issues in Chapter 9. In fact, they speak of the "Means Model", which is a distinctly different parameterization than the model you get through SAS. In the "Means Model", all these issues go away. There is a distinct coefficient for the intercept of Group A, and a distinct coefficient for the intercept of Group B. And then, it doesn't matter whether you set A to be 0 and B to be 1, or the other way around.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 02:33 PM

thanks for the great answer, pagemiller.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2012 01:44 PM

Sir, we are creating a regression equation for prediction, not for comparison of two groups.

Okay, then why the concern about the different intercepts? As you said, the models are equivalent. Either will give you the same predicted values.

As I have pointed out, and now as Steve seems to be pointing out, you can create models for prediction, or you can create models for understanding the individual terms (or both). Do NOT confuse the two. If you want a predictive model, then you choose either, and you report the Overall F as its level of significance. If you want to understand the individual terms, you report the tests of the individual model coefficients, with appropriate interpretation. (and of course you can do both)

You keep wandering back and forth between obtaining predictive model, and obtaining understanding of the individual terms.