Solved
Contributor
Posts: 52

# ANOVA and OLS regression disagreement

Okay, so we'll skip the long drawn out story about how I miscoded something and had all sorts of discussions about these odd findings and just get to the point that there was indeed a coding error, I discovered it, and now I'm going back through my results and conducting (new) interpretation.

Long story short, once I found the error, I ran it through some tests and now I need some help because I can't make sense of what I'm seeing.

First, ANOVA; then OLS regression; then we look to the comparison.

*Number of times incarcerated;
if 0<numincar<6;
if numincar=1 then once=1; else once=0;
if numincar=2 then two=1; else two=0;
if numincar=3 then three=1; else three=0;
if numincar=4 then four=1; else four=0;
if numincar=5 then fiveplus=1; else fiveplus=0;

proc anova;
class numincar;
model finknow=numincar;
means numincar/scheffe;
run;

Which resulted in: (good news...)

The ANOVA Procedure

Dependent Variable: FINKNOW

 Source DF Sum of Squares Mean Square F Value Pr > F Model 4 43.924225 10.981056 2.80 0.0265 Error 264 1036.016295 3.924304 Corrected Total 268 1079.940520

 R-Square Coeff Var Root MSE FINKNOW Mean 0.040673 28.23981 1.980986 7.014870

 Source DF Anova SS Mean Square F Value Pr > F NumIncar 4 43.92422522 10.98105631 2.80 0.0265

 Alpha 0.05 Error Degrees of Freedom 264 Error Mean Square 3.9243 Critical Value of F 2.40584

 Comparisons significant at the 0.05 levelare indicated by ***. NumIncarComparison DifferenceBetweenMeans Simultaneous 95% ConfidenceLimits 2 - 1 0.4419 -0.5936 1.4773 2 - 3 0.7942 -0.4753 2.0638 2 - 4 1.0323 -0.4034 2.4681 2 - 5 1.2135 -0.0646 2.4917 1 - 2 -0.4419 -1.4773 0.5936 1 - 3 0.3524 -0.7696 1.4744 1 - 4 0.5905 -0.7166 1.8975 1 - 5 0.7717 -0.3600 1.9034 3 - 2 -0.7942 -2.0638 0.4753 3 - 1 -0.3524 -1.4744 0.7696 3 - 4 0.2381 -1.2612 1.7374 3 - 5 0.4193 -0.9299 1.7685 4 - 2 -1.0323 -2.4681 0.4034 4 - 1 -0.5905 -1.8975 0.7166 4 - 3 -0.2381 -1.7374 1.2612 4 - 5 0.1812 -1.3254 1.6878 5 - 2 -1.2135 -2.4917 0.0646 5 - 1 -0.7717 -1.9034 0.3600 5 - 3 -0.4193 -1.7685 0.9299 5 - 4 -0.1812 -1.6878 1.3254

So, the model is significant. Good. When I get to the 1-5, 1-2, 1-3, 1-4, etc. etc. no significance. Not what I was hoping for, but just knowing that "numincar" should be included in my OLS regression is worthwhile.

Okay, so then I'm checking some results in OLS. (one incarceration is the comparison group)

proc reg;

model finknow= two three four fiveplus/tol vif;

run;

And I get the following:

SAS Output

The REG Procedure
Model: MODEL1
Dependent Variable: FINKNOW

 Number of Observations Read 269 269

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 43.92423 10.98106 2.80 0.0265
Error 264 1036.01630 3.92430
Corrected Total 268 1079.94052

 Root MSE R-Square 1.98099 0.0407 7.01487 0.0261 28.2398

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| Tolerance Variance
Inflation
Intercept 1 7.16190 0.19332 37.05 <.0001 . 0
two 1 0.44187 0.33379 1.32 0.1867 0.82762 1.20828
three 1 -0.35238 0.36168 -0.97 0.3308 0.84644 1.18141
four 1 -0.59048 0.42134 -1.40 0.1623 0.88120 1.13482
fiveplus 1 -0.77166 0.36481 -2.12 0.0353 0.84850 1.17854

Model is still significant (p<.05 (but same number as in ANOVA p=.0265) (yay) but now when we move down to fiveplus (which would have come up on the ANOVA as 1-5 and/or 5-1) in the OLS there is sigificance at p<.05.

QUESTION: Why is there significance when I run the OLS, but not when I run the ANOVA. Same thing right? Only one predictor in the OLS statement should match up with the ANOVA output, correct?

Please, someone help me figure out where I'm going wrong.

Kate

Accepted Solutions
Solution
‎08-30-2016 06:04 PM
Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Got ahold of a classmate--it's significant in the OLS and not the ANOVA because OLS is looking for a linear relationship where as ANOVA is just checking for significant differences, not a linear relationship between numincar and finknow. Phew!

All Replies
Posts: 5,046

## Re: ANOVA and OLS regression disagreement

Class once is missing from proc reg model?

PG
Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Class once is the comparison group. So, for the proc reg results those incarcerated five times or more, are significantly less financially knowledgeable as compared to those who have only been incarcerated once.

In the ANOVA, it's comparing once to twice, once to three times, once to four times, once to five times, etc five times to once, five times to twice, etc. etc...so shouldn't the 5-1 adn 1-5 comparisons show significance in the ANOVA?

Solution
‎08-30-2016 06:04 PM
Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Got ahold of a classmate--it's significant in the OLS and not the ANOVA because OLS is looking for a linear relationship where as ANOVA is just checking for significant differences, not a linear relationship between numincar and finknow. Phew!

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

So, yes I believe we've got the right solution to this, but then if I were to use a dichotomous predictor variable the relationship *couldn't* be linear...which is the reasoning for a t-test instead of just running one (dichotomous) predictor variable in an OLS regression...right?

Gotta talk this out either to myself or with someone else, LOLOL. So, if you disagree, please tell me why. If you agree please let me know that too so that I'm not out here hanging with baseless hope.

But on that same token (going back to the original reason for the post) even WITHOUT OLS looking for a linear relationship, why would the linear relationship (OLS) be significant and the "Who cares if it's linear is there any relationship at all" (ANOVA) not be significant? In the long run, why isn't the difference significant on both REGARDLESS of linear relationship?

Super User
Posts: 20,730

## Re: ANOVA and OLS regression disagreement

I think it has to do with Scheffes option instead.

You're correcting for multiple comparisons where the regression does not.

Wnat happens if you remove it from your means statement?

PS I would STRONGLY recommend you add Data= statements to your procs to clearly identify your input data set. One day that will save you hours of debugging time. And hopefully prevent mistakes from proceeding through.

Proc reg data=mydata;

....

proc anova data=mydata;

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Reeza,

I see what you are saying about the multiple comparisons in the ANOVA when the regression doesn't do that.

I will check and see what happens when I remove the Scheffe from the means statement. Thanks for the suggestion.

I have only ever used the data=statements once and it never made any sense to me (my training has been between two professors who coded and used data in completely differently ways)...after I get through prelims I will make a note to investigate other ways to do this. Does it matter that I generally use primary data and I've only got 1 data set currently in use? Or, if I use another, I have a libname statement that directs me only to that specific data?

K8

Super User
Posts: 20,730

## Re: ANOVA and OLS regression disagreement

No. If you ran a regression that created an output dataset which happened to have all the variables but not all the observations (due to a where clause) and your next proc used that dataset instead of the original would you catch it?

It's pretty steaightforward as well. Not understanding how the proc is using a data set with or without a data statement is dangerous.

PROC NAME Data = <name of input dataset > (other options);

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Reeza,

You are giving me entirely too much credit. I have no idea how to use a where clause in a regression statement or even how to run a regression that would create an output data set. All my regressions do is spit out the results as I've listed above (the generally with more variables )

I was taught to do this:

``````libname TCFinEd "C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data";

proc import datafile="C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data\TCFinEd.csv"
out=TCFinEd.state dbms=dlm replace;
delimiter=",";
getnames=yes;
guessingrows=400;
run;

data TCFinEd;
set TCFinEd.state;

*gender;
if 0<=gender<99;
if gender=1 then male=1; else male=0;

*age;
if 0<age<80;
if age in (40:46) then middleage=1; else middleage=0;
if age in (47:71) then olderfolk=1; else olderfolk=0;
``````

There was one 2-week session where we used the PROC NAME data=<name of dataset> but then we never saw that professor for anything stats again...and the 2 weeks was simply not enough time for me to understand the value/become comfortable with it. I'm not saying I'm not willing to learn to do what you're talking about, but the professor we had for an entire semester and the other prof I've seen for anything SAS related doesn't use it...or at least he's never made any comments about me not using it and what coding he has shown me didn't have it. So, I do have it written down as something I need to learn to do after prelims, but I just don't have the background to understand the necessity of it (at this time...though I'm willing to take your word for it!!)

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

@Reeza I removed the Scheffe and ran it:

proc anova;

class numincar;

model finknow=numincar;

means numincar;

run;

This helped for clarity. Then I added Tukey and found some significance...the difference in the means between what is and is not significant is very, very small, but alas, not everything can be significant.

That said, I DO have unequal groups. I read that Tukey can be used for unequal groups via Tukey-Kramer, but the only thing I can find for that is the means numincar/tukey...and what I could find in the SAS Support pages is that it's the same code. Am I missing something or should the code remain means numincar/tukey; ?

Thanks,

Kate

Posts: 2,655

## Re: ANOVA and OLS regression disagreement

Hi Kate,

If you have unequal group sizes, you should not use PROC ANOVA.  Instead try PROC GLM.  Something like:

``````proc glm data=yourdata;
class numincar;
model finknow=numincar;
/* Or Tukey, or even better, adjust=simulate */
run;``````

This should give the least squares means which are more acceptable for comparing group means when the data are unbalanced.

Steve Denham

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

Thank you for the guidance! I have not been trained on proc glm and have only played with it a little bit. I appreciate your help and I will give that a try!! What does the "simulate" do?

Thank you!!

Kate

Posts: 2,655

## Re: ANOVA and OLS regression disagreement

adjust=simulate applies the methods of Edwards and Berry to account for any observed correlation of means, and is probably the most appealing adjustment for multiple comparisons available as an option to the LSMEANS statement.  To get an idea of what it does, read this section in the documentation:

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_glm_syntax10...

Steve Denham

Contributor
Posts: 52

## Re: ANOVA and OLS regression disagreement

@SteveDenham Thank you for the information and for the resource! I will definitely check it out!!

Have a great weekend!

Kate

☑ This topic is solved.