Hi All,
I have a 2-level model i'm exploring with PROC MIXED. I have 1 CLASS variable with 2 levels and a continuous covariate. I am able to generate the LSMEANS of the class variable at -/+ 1 of the z-scored covariate as follow:
proc mixed data=rq4 method=reml covtest;
class schoolid base_tch_gd5(ref=first);
model parcc_math_16=parcc_math_15_gdmn base_tch_gd5/solution ddfm=bw ;
random intercept/sub=schoolid type=un;
lsmeans base_tch_gd5/ at parcc_math_15_gdmn=-1 diff;
lsmeans base_tch_gd5/ at parcc_math_15_gdmn= 1 diff;
run;
The DIFF command compares the means of when base_tch_gd=0 vs base_tch_gd=1. How can i test whether the LSMEAN of base_tch_gd=0 at -1 is different from the LSMEAN of base_tch_gd=0 at 1?
Thanks.
J
Try
lsmeans base_tch_gd5 / at parcc_math_a5_gdmn= -1;
lsmeans base_tch_gd5 / at parcc_math_a5_gdmn= 1;
estimate "base_tch_gd5=0 at -1" intercept 1 base_tch_gd5 1 0 parcc_math_a5_gdmn -1; /* should match LSMEANS */
estimate "base_tch_gd5=0 at 1" intercept 1 base_tch_gd5 1 0 parcc_math_a5_gdmn 1; /* should match LSMEANS */
estimate "base_tch_gd5=0 at -1 v at 1" intercept 0 base_tch_gd5 0 0 parcc_math_a5_gdmn -2; /* should be difference of two LSMEANS */
I guess i should have been more explicit. Is there a way to arrive at this solution other than ESTIMATE. I ask because i have a much more complex model. Specifically, the model below estimates the LSMEANS, but when i use the ESTIMATE to obtain the values, i can recover them exactly if i don't include the last, 3-variable interaction. Once i include that term, using the L matrix coefficients from the model, i don't get the same values.
proc mixed data=rq4 method=reml covtest;
class schoolid base_tch_gd5(ref=first);
model parcc_math_16=parcc_math_15_gdmn tx base_tch_gd5 asian15 stud_ell15 stud_speced15 stud_ed15 white15
base_tch_gd5*parcc_math_15_gdmn base_tch_gd5*tx base_tch_gd5*stud_ell15
parcc_math_15_gdmn*asian15 parcc_math_15_gdmn*stud_speced15
stud_ed15*tx
base_tch_gd5*stud_ell15*tx
/ solution ddfm=bw;
random intercept base_tch_gd5 stud_speced15/sub=schoolid type=un;
lsmeans base_tch_gd5/ at parcc_math_15_gdmn=-1 cl cov diff e;
lsmeans base_tch_gd5/ at parcc_math_15_gdmn= 1 cl cov diff e;
ods output coef=coef diffs=diffs;
run;
Ah, you are right, you should have been more explicit.
That said, if you understand your model and how the ESTIMATE statement works and what hypothesis you want to test, then you will be able to write a statement that accomplishes what you need, even with high-order interactions. I am unclear how you are using the L matrix (you have not provided those details). I'm sure that you will not be able to do what you want with a LSMEANS statement: complicated questions require complicated technical details.
Generally speaking, you should include all lower order interactions associated with a high-order interaction in the MODEL statement: if A*B*C is in the MODEL statement, then A, B, C, A*B, A*C, and B*C should also be included. This is (sometimes) referred to as the "hierarchy of interaction".
From your point of view, it is useful to have meaningful variable names. From my point of view, your variable names make it hard to see what variables are in your model. The names are too long, and it takes a lot of effort to sort it out. We are working for free here, be considerate: using A, B, ... H would be much more clear when you post your question.
You don't show in your code how you tried to use an ESTIMATE statement and how it failed. Show us what you tried. Also it is good practice to provide an example dataset.
I hope this helps you move forward.
I honestly was trying to keep it simple in the hopes there was an alternative way of accomplishing things. But here goes (see attached data, i tried attaching a SAS data file, but got an error):
Below is the model i have where i am predicting post-test (post) from pre-test (pre), treatment/control (tx) and a host of other variables (a thru g). When i run this i get LSMEANS for a=0 at -1 of 717.06 and at +1 of 718.55. I also get the L coefficients from the 'e' command on the LSMEANS statement.
proc mixed data=sample method=reml covtest;
class schoolid a(ref=first);
model post=pre tx a b c d f g
a*pre a*tx a*c
pre*b pre*d
f*tx
a*c*tx
/ solution ddfm=bw;
random intercept/sub=schoolid type=un;
lsmeans a/ at pre=-1 e;
lsmeans a/ at pre= 1 e;
run;
Now here is my code to generate the estimate statements.Can someone tell me why i have to enter .004 for the a*c*tx effect in the ESTIMATE statements when i am trying to obtain a value for when a=0, particularly when the L matrix suggests the coefficient is -.004. That's where i'm stuck, i'm afraid.
proc mixed data=sample method=reml covtest;
class schoolid;
model post=pre tx a b c d f g
a*pre a*tx a*c
pre*b pre*d
f*tx
a*c*tx
/ solution ddfm=bw;
random intercept a/sub=schoolid type=un;
estimate 'pre -1/a=0' intercept 1 pre -1 tx -.013 a 0 b 0.0151 c 0.2807 d 0.1404
f 0.9524 g 0.0348
a*pre 0 a*tx 0 a*c 0
pre*b -.0151 pre*d -.14
f*tx -.012
a*c*tx 0.004;
estimate 'pre 1/a=0' intercept 1 pre 1 tx -.013 a 0 b 0.0151 c 0.2807 d 0.1404
f 0.9524 g 0.0348
a*pre 0 a*tx 0 a*c 0
pre*b .0151 pre*d .1404
f*tx -.012
a*c*tx 0.004;
estimate 'a=0:pre -sd vs pre sd' pre 2 pre*d 0.2807 pre*b .0301/cl;
run;
I regret I don't have much time to look at this currently, but I'll make a comment:
You will want to decide which variables to include in the CLASS statement. From your example data set, it looks like many of the predictor variables are binary (tx, a, b, c, d, f, g). The syntax of the ESTIMATE statement depends upon the parameterization of the model, which depends upon whether a variable is categorical (CLASS) or continuous. Your CLASS statements in the two models you posted contain different variables, which means that your two models are parameterized differently.
Also, I really think you should include c*tx in your model, to maintain the hierarchy of a*c*tx.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.