BookmarkSubscribeRSS Feed
plf515
Lapis Lazuli | Level 10
I ran into this oddity at work.... I can't show the real data, but I made up some (see below).
In this example, the differences are quite small, but in the data at work they were not so small.

Suppose you have a data set with a dependent variable and some categorical variables. Some of these
are coded 0-1, some have several levels. I was under the impression that with 0-1 variables, it did
not matter whether you included them on the CLASS statement, and, indeed, the parameter estimates are
identical for the two versions below. But LSMEANS are not identical, even for the variable with more
than 2 levels, which is always on the CLASS statement.

So ...

proc format;
value racfmt 1 = 'Black'
2 = 'White'
3 = 'Latino';
run;

data today;
input catv1 catv2 catv3 @@;
dv = catv1 * 3 + catv2 * 5 + catv3 + rannor(123);
format catv3 racfmt.;
datalines;
0 1 2 0 1 1 0 1 3 1 0 1 0 0 3 0 1 3 0 1 2 1 0 3 1 0 1 0 1 2 0 0 3
1 1 2 1 1 1 0 1 3 1 1 1 1 0 3 0 1 3 1 1 2 1 0 3 1 0 1 0 1 2 1 1 3
0 0 2 1 0 1 1 1 3 0 0 1 0 0 3 0 0 3 0 0 2 1 0 3 1 0 1 0 1 2 1 1 3
;
run;


title 'Version with all on CLASS statement';
proc glm data = today;
class catv1 catv2 catv3;
model dv = catv1 catv2 catv3;
lsmeans catv1 catv2 catv3;
run;


title 'Version with only race on CLASS statement';
proc glm data = today;
class catv3;
model dv = catv1 catv2 catv3;
lsmeans catv3;
run;

and there are differences ....

I understand the models are parameterized a bit differently, with different intercepts, but
shouldn't LSMEANS be the same? And which are 'correct'?
2 REPLIES 2
SteveDenham
Jade | Level 19
Hey Peter,

Look at this (code added to your original):

proc format;
value racfmt 1 = 'Black'
2 = 'White'
3 = 'Latino';
run;

data today;
input catv1 catv2 catv3 @@;
dv = catv1 * 3 + catv2 * 5 + catv3 + rannor(123);
format catv3 racfmt.;
datalines;
0 1 2 0 1 1 0 1 3 1 0 1 0 0 3 0 1 3 0 1 2 1 0 3 1 0 1 0 1 2 0 0 3
1 1 2 1 1 1 0 1 3 1 1 1 1 0 3 0 1 3 1 1 2 1 0 3 1 0 1 0 1 2 1 1 3
0 0 2 1 0 1 1 1 3 0 0 1 0 0 3 0 0 3 0 0 2 1 0 3 1 0 1 0 1 2 1 1 3
;
run;


title 'Version with all on CLASS statement';
proc glm data = today;
class catv1 catv2 catv3;
model dv = catv1 catv2 catv3;
lsmeans catv1 catv2 catv3;
run;


title 'Version with only race on CLASS statement';
proc glm data = today;
class catv3;
model dv = catv1 catv2 catv3;
lsmeans catv3;
run;

proc means data=today;
var dv catv1 catv2 catv3;
run;

title 'Version with only race on CLASS statement, with at=0.5';
title2 'Results same as all on CLASS statement';
proc glm data = today;
class catv3;
model dv = catv1 catv2 catv3;
lsmeans catv3/at (catv1 catv2)=(0.5 0.5);
run;

title 'Version with only race on CLASS statement, with at=';
title2 'Results same as only race on CLASS statement';
proc glm data = today;
class catv3;
model dv = catv1 catv2 catv3;
lsmeans catv3/at (catv1 catv2)=(0.48484848485 0.5151515151515);
run;


So the difference is in the solution for the OLS equations. The first calculates lsmeans with equal weighting by class membership (which was the whole point of Searle, Speed and Milliken (1980), I think), while the second calculates lsmeans at the mean value.

I would bet that the large differences in your real data arise from substantial differences in class size.

Good luck.
plf515
Lapis Lazuli | Level 10
Thanks Steve!

That is very clear.

Peter

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1469 views
  • 0 likes
  • 2 in conversation