turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- ANOVA and OLS regression disagreement

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-30-2016 01:53 PM

Okay, so we'll skip the long drawn out story about how I miscoded something and had all sorts of discussions about these odd findings and just get to the point that there was indeed a coding error, I discovered it, and now I'm going back through my results and conducting (new) interpretation.

Long story short, once I found the error, I ran it through some tests and now I need some help because I can't make sense of what I'm seeing.

First, ANOVA; then OLS regression; then we look to the comparison.

*Number of times incarcerated;

if **0**<numincar<**6**;

if numincar=**1** then once=**1**; else once=**0**;

if numincar=**2** then two=**1**; else two=**0**;

if numincar=**3** then three=**1**; else three=**0**;

if numincar=**4** then four=**1**; else four=**0**;

if numincar=**5** then fiveplus=**1**; else fiveplus=**0**;

class numincar;

model finknow=numincar;

means numincar/scheffe;

Which resulted in: (good news...)

The ANOVA Procedure

Dependent Variable: FINKNOW

Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |

Model | 4 | 43.924225 | 10.981056 | 2.80 | 0.0265 |

Error | 264 | 1036.016295 | 3.924304 | ||

Corrected Total | 268 | 1079.940520 |

R-Square | Coeff Var | Root MSE | FINKNOW Mean |

0.040673 | 28.23981 | 1.980986 | 7.014870 |

Source | DF | Anova SS | Mean Square | F Value | Pr > F |

NumIncar | 4 | 43.92422522 | 10.98105631 | 2.80 | 0.0265 |

Alpha | 0.05 |

Error Degrees of Freedom | 264 |

Error Mean Square | 3.924304 |

Critical Value of F | 2.40584 |

Comparisons significant at the 0.05 level are indicated by ***. |
||||

NumIncar Comparison |
Difference Between Means |
Simultaneous 95% Confidence Limits |
||

2 - 1 | 0.4419 | -0.5936 | 1.4773 | |

2 - 3 | 0.7942 | -0.4753 | 2.0638 | |

2 - 4 | 1.0323 | -0.4034 | 2.4681 | |

2 - 5 | 1.2135 | -0.0646 | 2.4917 | |

1 - 2 | -0.4419 | -1.4773 | 0.5936 | |

1 - 3 | 0.3524 | -0.7696 | 1.4744 | |

1 - 4 | 0.5905 | -0.7166 | 1.8975 | |

1 - 5 | 0.7717 | -0.3600 | 1.9034 | |

3 - 2 | -0.7942 | -2.0638 | 0.4753 | |

3 - 1 | -0.3524 | -1.4744 | 0.7696 | |

3 - 4 | 0.2381 | -1.2612 | 1.7374 | |

3 - 5 | 0.4193 | -0.9299 | 1.7685 | |

4 - 2 | -1.0323 | -2.4681 | 0.4034 | |

4 - 1 | -0.5905 | -1.8975 | 0.7166 | |

4 - 3 | -0.2381 | -1.7374 | 1.2612 | |

4 - 5 | 0.1812 | -1.3254 | 1.6878 | |

5 - 2 | -1.2135 | -2.4917 | 0.0646 | |

5 - 1 | -0.7717 | -1.9034 | 0.3600 | |

5 - 3 | -0.4193 | -1.7685 | 0.9299 | |

5 - 4 | -0.1812 | -1.6878 | 1.3254 |

So, the model is significant. Good. When I get to the 1-5, 1-2, 1-3, 1-4, etc. etc. no significance. Not what I was hoping for, but just knowing that "numincar" should be included in my OLS regression is worthwhile.

Okay, so then I'm checking some results in OLS. (one incarceration is the comparison group)

**proc** **reg**;

model finknow= two three four fiveplus/tol vif;

**run**;

And I get the following:

SAS Output

The REG Procedure

Model: MODEL1

Dependent Variable: FINKNOW

Number of Observations Read | 269 |
---|---|

Number of Observations Used | 269 |

Analysis of Variance | |||||
---|---|---|---|---|---|

Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |

Model | 4 | 43.92423 | 10.98106 | 2.80 | 0.0265 |

Error | 264 | 1036.01630 | 3.92430 | ||

Corrected Total | 268 | 1079.94052 |

Root MSE | 1.98099 | R-Square | 0.0407 |
---|---|---|---|

Dependent Mean | 7.01487 | Adj R-Sq | 0.0261 |

Coeff Var | 28.23981 |

Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|

Variable | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| | Tolerance | Variance Inflation |

Intercept | 1 | 7.16190 | 0.19332 | 37.05 | <.0001 | . | 0 |

two | 1 | 0.44187 | 0.33379 | 1.32 | 0.1867 | 0.82762 | 1.20828 |

three | 1 | -0.35238 | 0.36168 | -0.97 | 0.3308 | 0.84644 | 1.18141 |

four | 1 | -0.59048 | 0.42134 | -1.40 | 0.1623 | 0.88120 | 1.13482 |

fiveplus | 1 | -0.77166 | 0.36481 | -2.12 | 0.0353 | 0.84850 | 1.17854 |

Model is still significant (p<.05 (but same number as in ANOVA p=.0265) (yay) but now when we move down to fiveplus (which would have come up on the ANOVA as 1-5 and/or 5-1) in the OLS there is sigificance at p<.05.

QUESTION: Why is there significance when I run the OLS, but not when I run the ANOVA. Same thing right? Only one predictor in the OLS statement should match up with the ANOVA output, correct?

Please, someone help me figure out where I'm going wrong.

Thanks in advance!!

Kate

Accepted Solutions

Solution

08-30-2016
06:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 06:03 PM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 02:58 PM

Class *once* is missing from **proc reg** model?

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

08-30-2016 04:33 PM

Class *once* is the comparison group. So, for the proc reg results those incarcerated five times or more, are significantly less financially knowledgeable as compared to those who have only been incarcerated once.

In the ANOVA, it's comparing once to twice, once to three times, once to four times, once to five times, etc five times to once, five times to twice, etc. etc...so shouldn't the 5-1 adn 1-5 comparisons show significance in the ANOVA?

Solution

08-30-2016
06:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 06:03 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 06:22 PM

So, yes I believe we've got the right solution to this, but then if I were to use a dichotomous predictor variable the relationship *couldn't* be linear...which is the reasoning for a t-test instead of just running one (dichotomous) predictor variable in an OLS regression...right?

Gotta talk this out either to myself or with someone else, LOLOL. So, if you disagree, please tell me why. If you agree please let me know that too so that I'm not out here hanging with baseless hope.

But on that same token (going back to the original reason for the post) even WITHOUT OLS looking for a linear relationship, why would the linear relationship (OLS) be significant and the "Who cares if it's linear is there any relationship at all" (ANOVA) not be significant? In the long run, why isn't the difference significant on both REGARDLESS of linear relationship?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 10:18 PM

I think it has to do with Scheffes option instead.

You're correcting for multiple comparisons where the regression does not.

Wnat happens if you remove it from your means statement?

PS I would STRONGLY recommend you add Data= statements to your procs to clearly identify your input data set. One day that will save you hours of debugging time. And hopefully prevent mistakes from proceeding through.

Proc reg data=mydata;

....

proc anova data=mydata;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

08-30-2016 10:41 PM

Reeza,

I see what you are saying about the multiple comparisons in the ANOVA when the regression doesn't do that.

I will check and see what happens when I remove the Scheffe from the means statement. Thanks for the suggestion.

I have only ever used the data=statements once and it never made any sense to me (my training has been between two professors who coded and used data in completely differently ways)...after I get through prelims I will make a note to investigate other ways to do this. Does it matter that I generally use primary data and I've only got 1 data set currently in use? Or, if I use another, I have a libname statement that directs me only to that specific data?

K8

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-30-2016 11:01 PM

No. If you ran a regression that created an output dataset which happened to have all the variables but not all the observations (due to a where clause) and your next proc used that dataset instead of the original would you catch it?

It's pretty steaightforward as well. Not understanding how the proc is using a data set with or without a data statement is dangerous.

PROC NAME Data = <name of input dataset > (other options);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

08-30-2016 11:36 PM

Reeza,

You are giving me entirely too much credit. I have no idea how to use a where clause in a regression statement or even how to run a regression that would create an output data set. All my regressions do is spit out the results as I've listed above (the generally with more variables )

I was taught to do this:

```
libname TCFinEd "C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data";
proc import datafile="C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data\TCFinEd.csv"
out=TCFinEd.state dbms=dlm replace;
delimiter=",";
getnames=yes;
guessingrows=400;
run;
data TCFinEd;
set TCFinEd.state;
*gender;
if 0<=gender<99;
if gender=1 then male=1; else male=0;
*age;
if 0<age<80;
if age in (19:25) then youngadult=1; else youngadult=0;
if age in (26:32) then adult=1; else adult=0;
if age in (33:39) then olderadult=1; else olderadult=0;
if age in (40:46) then middleage=1; else middleage=0;
if age in (47:71) then olderfolk=1; else olderfolk=0;
```

There was one 2-week session where we used the PROC NAME data=<name of dataset> but then we never saw that professor for anything stats again...and the 2 weeks was simply not enough time for me to understand the value/become comfortable with it. I'm not saying I'm not willing to learn to do what you're talking about, but the professor we had for an entire semester and the other prof I've seen for anything SAS related doesn't use it...or at least he's never made any comments about me not using it and what coding he has shown me didn't have it. So, I do have it written down as something I need to learn to do after prelims, but I just don't have the background to understand the necessity of it (at this time...though I'm willing to take your word for it!!)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

08-31-2016 01:46 PM

@Reeza I removed the Scheffe and ran it:

**proc** **anova**;

class numincar;

model finknow=numincar;

means numincar;

**run**;

This helped for clarity. Then I added Tukey and found some significance...the difference in the means between what is and is not significant is very, very small, but alas, not everything can be significant.

That said, I DO have unequal groups. I read that Tukey can be used for unequal groups via Tukey-Kramer, but the only thing I can find for that is the means numincar/tukey...and what I could find in the SAS Support pages is that it's the same code. Am I missing something or should the code remain means numincar/tukey; ?

Thanks,

Kate

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

09-01-2016 10:41 AM

Hi Kate,

If you have unequal group sizes, you should not use PROC ANOVA. Instead try PROC GLM. Something like:

```
proc glm data=yourdata;
class numincar;
model finknow=numincar;
lsmeans numincar/pdiff stderr adjust=scheffe;
/* Or Tukey, or even better, adjust=simulate */
run;
```

This should give the least squares means which are more acceptable for comparing group means when the data are unbalanced.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

09-01-2016 11:17 AM

Thank you for the guidance! I have not been trained on proc glm and have only played with it a little bit. I appreciate your help and I will give that a try!! What does the "simulate" do?

Thank you!!

Kate

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ksmielitz

09-02-2016 02:00 PM

adjust=simulate applies the methods of Edwards and Berry to account for any observed correlation of means, and is probably the most appealing adjustment for multiple comparisons available as an option to the LSMEANS statement. To get an idea of what it does, read this section in the documentation:

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

09-02-2016 07:41 PM

@SteveDenham Thank you for the information and for the resource! I will definitely check it out!!

Have a great weekend!

Kate