BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ksmielitz
Quartz | Level 8

Okay, so we'll skip the long drawn out story about how I miscoded something and had all sorts of discussions about these odd findings and just get to the point that there was indeed a coding error, I discovered it, and now I'm going back through my results and conducting (new) interpretation.

 

Long story short, once I found the error, I ran it through some tests and now I need some help because I can't make sense of what I'm seeing.

 

First, ANOVA; then OLS regression; then we look to the comparison.

*Number of times incarcerated;
if 0<numincar<6;
if numincar=1 then once=1; else once=0;
if numincar=2 then two=1; else two=0;
if numincar=3 then three=1; else three=0;
if numincar=4 then four=1; else four=0;
if numincar=5 then fiveplus=1; else fiveplus=0;
 
proc anova;
class numincar;
model finknow=numincar;
means numincar/scheffe;
run;

 

 
Which resulted in: (good news...)
 
 
The ANOVA Procedure
 
Dependent Variable: FINKNOW
 
Source DF Sum of Squares Mean Square F Value Pr > F
Model 4 43.924225 10.981056 2.80 0.0265
Error 264 1036.016295 3.924304    
Corrected Total 268 1079.940520      


R-Square Coeff Var Root MSE FINKNOW Mean
0.040673 28.23981 1.980986 7.014870


Source DF Anova SS Mean Square F Value Pr > F
NumIncar 4 43.92422522 10.98105631 2.80 0.0265

 

 
 
Alpha 0.05
Error Degrees of Freedom 264
Error Mean Square 3.924304
Critical Value of F 2.40584

Comparisons significant at the 0.05 level
are indicated by ***.
NumIncar
Comparison
Difference
Between
Means
Simultaneous 95% Confidence
Limits
 
2 - 1 0.4419 -0.5936 1.4773  
2 - 3 0.7942 -0.4753 2.0638  
2 - 4 1.0323 -0.4034 2.4681  
2 - 5 1.2135 -0.0646 2.4917  
1 - 2 -0.4419 -1.4773 0.5936  
1 - 3 0.3524 -0.7696 1.4744  
1 - 4 0.5905 -0.7166 1.8975  
1 - 5 0.7717 -0.3600 1.9034  
3 - 2 -0.7942 -2.0638 0.4753  
3 - 1 -0.3524 -1.4744 0.7696  
3 - 4 0.2381 -1.2612 1.7374  
3 - 5 0.4193 -0.9299 1.7685  
4 - 2 -1.0323 -2.4681 0.4034  
4 - 1 -0.5905 -1.8975 0.7166  
4 - 3 -0.2381 -1.7374 1.2612  
4 - 5 0.1812 -1.3254 1.6878  
5 - 2 -1.2135 -2.4917 0.0646  
5 - 1 -0.7717 -1.9034 0.3600  
5 - 3 -0.4193 -1.7685 0.9299  
5 - 4 -0.1812 -1.6878 1.3254  
 
 
So, the model is significant. Good. When I get to the 1-5, 1-2, 1-3, 1-4, etc. etc. no significance. Not what I was hoping for, but just knowing that "numincar" should be included in my OLS regression is worthwhile.
 
Okay, so then I'm checking some results in OLS. (one incarceration is the comparison group)
 

proc reg;

model finknow= two three four fiveplus/tol vif;

run;

 

And I get the following:

 

SAS Output

The REG Procedure
Model: MODEL1
Dependent Variable: FINKNOW

 

Number of Observations Read 269
Number of Observations Used 269

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 43.92423 10.98106 2.80 0.0265
Error 264 1036.01630 3.92430    
Corrected Total 268 1079.94052      

Root MSE 1.98099 R-Square 0.0407
Dependent Mean 7.01487 Adj R-Sq 0.0261
Coeff Var 28.23981    

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| Tolerance Variance
Inflation
Intercept 1 7.16190 0.19332 37.05 <.0001 . 0
two 1 0.44187 0.33379 1.32 0.1867 0.82762 1.20828
three 1 -0.35238 0.36168 -0.97 0.3308 0.84644 1.18141
four 1 -0.59048 0.42134 -1.40 0.1623 0.88120 1.13482
fiveplus 1 -0.77166 0.36481 -2.12 0.0353 0.84850 1.17854

Model is still significant (p<.05 (but same number as in ANOVA p=.0265) (yay) but now when we move down to fiveplus (which would have come up on the ANOVA as 1-5 and/or 5-1) in the OLS there is sigificance at p<.05. 

 

QUESTION: Why is there significance when I run the OLS, but not when I run the ANOVA. Same thing right? Only one predictor in the OLS statement should match up with the ANOVA output, correct?

 

Please, someone help me figure out where I'm going wrong.

 

Thanks in advance!!

Kate

 

 

 

 

 

 

 
 
 
 
 
 
 
 
1 ACCEPTED SOLUTION

Accepted Solutions
ksmielitz
Quartz | Level 8

Got ahold of a classmate--it's significant in the OLS and not the ANOVA because OLS is looking for a linear relationship where as ANOVA is just checking for significant differences, not a linear relationship between numincar and finknow. Phew! 🙂

View solution in original post

13 REPLIES 13
PGStats
Opal | Level 21

Class once is missing from proc reg model?

PG
ksmielitz
Quartz | Level 8

Class once is the comparison group. So, for the proc reg results those incarcerated five times or more, are significantly less financially knowledgeable as compared to those who have only been incarcerated once.

 

In the ANOVA, it's comparing once to twice, once to three times, once to four times, once to five times, etc five times to once, five times to twice, etc. etc...so shouldn't the 5-1 adn 1-5 comparisons show significance in the ANOVA?

ksmielitz
Quartz | Level 8

Got ahold of a classmate--it's significant in the OLS and not the ANOVA because OLS is looking for a linear relationship where as ANOVA is just checking for significant differences, not a linear relationship between numincar and finknow. Phew! 🙂

ksmielitz
Quartz | Level 8

So, yes I believe we've got the right solution to this, but then if I were to use a dichotomous predictor variable the relationship *couldn't* be linear...which is the reasoning for a t-test instead of just running one (dichotomous) predictor variable in an OLS regression...right?

 

Gotta talk this out either to myself or with someone else, LOLOL. So, if you disagree, please tell me why. If you agree please let me know that too so that I'm not out here hanging with baseless hope. 😉

 

But on that same token (going back to the original reason for the post) even WITHOUT OLS looking for a linear relationship, why would the linear relationship (OLS) be significant and the "Who cares if it's linear is there any relationship at all" (ANOVA) not be significant? In the long run, why isn't the difference significant on both REGARDLESS of linear relationship?

Reeza
Super User

I think it has to do with Scheffes option instead. 

You're correcting for multiple comparisons where the regression does not. 

 

 

Wnat happens if you remove it from your means statement? 

 

PS I would STRONGLY recommend you add Data= statements to your procs to clearly identify your input data set. One day that will save you hours of debugging time. And hopefully prevent mistakes from proceeding through. 

 

Proc reg data=mydata;

....

proc anova data=mydata;

 

ksmielitz
Quartz | Level 8

Reeza, 

 

I see what you are saying about the multiple comparisons in the ANOVA when the regression doesn't do that.

I will check and see what happens when I remove the Scheffe from the means statement. Thanks for the suggestion.

 

I have only ever used the data=statements once and it never made any sense to me (my training has been between two professors who coded and used data in completely differently ways)...after I get through prelims I will make a note to investigate other ways to do this. Does it matter that I generally use primary data and I've only got 1 data set currently in use? Or, if I use another, I have a libname statement that directs me only to that specific data?

 

K8

Reeza
Super User

No. If you ran a regression that created an output dataset which happened to have all the variables but not all the observations (due to a where clause) and your next proc used that dataset instead of the original would you catch it?

 

 

It's pretty steaightforward as well. Not understanding how the proc is using a data set with or without a data statement is dangerous. 

 

PROC NAME Data = <name of input dataset > (other options);

 

 

ksmielitz
Quartz | Level 8

Reeza, 

 

You are giving me entirely too much credit. I have no idea how to use a where clause in a regression statement or even how to run a regression that would create an output data set. All my regressions do is spit out the results as I've listed above (the generally with more variables ;))

 

I was taught to do this: 

 

libname TCFinEd "C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data";

proc import datafile="C:\Users\Kate\Desktop\Imported SAS\Statewide TC Data\TCFinEd.csv"
out=TCFinEd.state dbms=dlm replace;
delimiter=",";
getnames=yes;
guessingrows=400;
run;

data TCFinEd;
set TCFinEd.state;

*gender;
if 0<=gender<99;
if gender=1 then male=1; else male=0;

*age;
if 0<age<80;
if age in (19:25) then youngadult=1; else youngadult=0;
if age in (26:32) then adult=1; else adult=0;
if age in (33:39) then olderadult=1; else olderadult=0;
if age in (40:46) then middleage=1; else middleage=0;
if age in (47:71) then olderfolk=1; else olderfolk=0;

There was one 2-week session where we used the PROC NAME data=<name of dataset> but then we never saw that professor for anything stats again...and the 2 weeks was simply not enough time for me to understand the value/become comfortable with it. I'm not saying I'm not willing to learn to do what you're talking about, but the professor we had for an entire semester and the other prof I've seen for anything SAS related doesn't use it...or at least he's never made any comments about me not using it and what coding he has shown me didn't have it. So, I do have it written down as something I need to learn to do after prelims, but I just don't have the background to understand the necessity of it (at this time...though I'm willing to take your word for it!!)

ksmielitz
Quartz | Level 8

@Reeza I removed the Scheffe and ran it:

proc anova;

class numincar;

model finknow=numincar;

means numincar;

run;

 

This helped for clarity. Then I added Tukey and found some significance...the difference in the means between what is and is not significant is very, very small, but alas, not everything can be significant. 😉

 

That said, I DO have unequal groups. I read that Tukey can be used for unequal groups via Tukey-Kramer, but the only thing I can find for that is the means numincar/tukey...and what I could find in the SAS Support pages is that it's the same code. Am I missing something or should the code remain means numincar/tukey; ?

 

Thanks, 

Kate

SteveDenham
Jade | Level 19

Hi Kate,

 

If you have unequal group sizes, you should not use PROC ANOVA.  Instead try PROC GLM.  Something like:

 

proc glm data=yourdata;
class numincar;
model finknow=numincar;
lsmeans numincar/pdiff stderr adjust=scheffe;
/* Or Tukey, or even better, adjust=simulate */
run;

This should give the least squares means which are more acceptable for comparing group means when the data are unbalanced.

 

Steve Denham

ksmielitz
Quartz | Level 8

@SteveDenham

 

Thank you for the guidance! I have not been trained on proc glm and have only played with it a little bit. I appreciate your help and I will give that a try!! What does the "simulate" do?

 

Thank you!!

Kate

SteveDenham
Jade | Level 19

adjust=simulate applies the methods of Edwards and Berry to account for any observed correlation of means, and is probably the most appealing adjustment for multiple comparisons available as an option to the LSMEANS statement.  To get an idea of what it does, read this section in the documentation:

 

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_glm_syntax10...

 

Steve Denham

ksmielitz
Quartz | Level 8

@SteveDenham Thank you for the information and for the resource! I will definitely check it out!!

Have a great weekend!

Kate

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2352 views
  • 6 likes
  • 4 in conversation