I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous. I am using proc glm for the regression. I want to be able to easily compare the adjusted mean_sbp at 5 joint_years to 15 joint_years. Or, compare the mean_sbp of the median of joint_years to the 99th percentile of joint_years.
Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.
I realize I could create categorical versions of joint_years and use it as a categorical variable, but I'd like something a little more flexible.
Here are some approaches I have tried:
proc mixed data=work.finallabel_6;
class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds;
model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6;
estimate 'Mean at joint_years=5' joint_years 1 / atmean joint_years=5;
estimate 'Mean at joint_years=15' joint_years 1 / atmean joint_years=15;
format
race1c race1c.
gender1 gender1f.
income6 income6fmt.
educ1 educ1fmt.
htn_meds
lipid_meds
dm_meds yes_no_fmt.;
run;
proc glm data=mesa.finallabel_6;
class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds;
model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6 / solution clparm;
/* estimate 'Difference at joint_years=0 vs joint_years=23' joint_years 1 -1; */
/* Use the estimate statement to compare the adjusted mean SBP at any 2 values of joint_years; */
/* this essentially multiplies the beta by 10 */
/* estimate joint_years 10; */
/* contrast 'Effect of increasing joint_years from 5 to 15' joint_years 10; */
lsmeans joint_years / at = (joint_years = 5) (joint_years = 15) adjust=e;
format
race1c race1c.
gender1 gender1f.
income6 income6fmt.
educ1 educ1fmt.
htn_meds
lipid_meds
dm_meds yes_no_fmt.
;
run;
Thanks for your help.
Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.
1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.
estimate 'diff@15-5' joint_years -10;
2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.
data mdat;
do joint_years=15,5;
output;
end;
run;
%margins(data=finallabel_6,
class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds,
response=mean_sbp,
model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6,
margins=joint_years, margindata=mdat, options=diff)
Hello,
Never use ESTIMATE or CONTRAST statements when you can also achieve the wished for analyses by using LSMEANS, SLICE, or LSMESTIMATE statements.
If ESTIMATE or CONTRAST statements cannot be avoided, take care because they are sometimes difficult to specify (program) correctly. Below are some blogs and usage notes on correct usage!
By Chris Daman on SAS Learning Post April 23, 2012
https://blogs.sas.com/content/sastraining/2012/04/23/the-magical-estimate-and-contrast-statements/
By Chris Daman on SAS Learning Post April 25, 2012
https://blogs.sas.com/content/sastraining/2012/04/25/easy-button-for-estimate-statements/
By Chris Daman on SAS Learning Post May 2, 2012
https://blogs.sas.com/content/sastraining/2012/05/02/estimate-statements-the-final-installment/
By Rick Wicklin on The DO Loop June 6, 2016
https://support.sas.com/kb/24/447.html
When a model contains interactions, it is often of interest to assess the effect of one of the interacting variables. When the variable of interest is categorical, and therefore is specified in the CLASS statement, this is most easily done using the LSMEANS, SLICE, or LSMESTIMATE statement. But when the variable of interest is continuous, these statements cannot be used. Two procedures, LOGISTIC and PHREG, provide statements that can estimate the effect of increasing a continuous predictor by a specified number of units. However, when the modeled response is not binomial or a time to event, these procedures are not appropriate. Nevertheless, the HAZARDRATIO statement in PROC PHREG can still be used to provide contrast coefficients that can be used in CONTRAST or ESTIMATE statements to test or estimate the effect of a continuous predictor. This can be done for a continuous predictor involved in one or more interactions and even in constructed effects such as splines. In the case of generalized linear models that don’t use the identity link, it is important to note that the estimated effect using these coefficients is not a difference in response means. For these models, an alternative and generally easier solution is provided by the Margins macro, which can estimate the required difference in means. Both of these approaches are discussed and illustrated in this note.
Koen
Thanks for the response and the helpful links.
Could you advise on how to use the LSMEANS, SLICE, or LSMESTIMATE statements to achieve the desired result?
Hello,
Here are some example LSMEANS statements.
I think you can find inspiration in the 3 last ones (with at and slice).
PROC MIXED : lsmeans adhesive / diff cl alpha=0.10;
PROC MIXED : lsmeans pretrt stain / diff adjust=tukey adjdfe=row;
PROC MIXED : lsmeans drug / diff=control('p') adjust=dunnett adjdfe=row;
PROC MIXED : lsmeans Trt / e diff;
PROC MIXED : lsmeans gender*age / slice=gender slice=age;
PROC GLIMMIX : lsmeans trt / oddsratio ilink diff;
PROC MIXED : lsmeans temp / at thick=1850 diff;
PROC MIXED : lsmeans Trt / at flush0=50 diff;
PROC MIXED : lsmeans drug*hour / slice=hour diff adjust=tukey adjdfe=row;
Koen
I don't think any of these are what the OP @_maldini_ wants, they all involve a class variable, and his original question was comparing the predicted values at two different values of a continuous variable, no class variable involved.
However, I also don't think the question as stated by the OP makes any sense (yet) so I don't have an answer.
I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous.
I think instead of "adjusted means" the OP really means "predicted values" (am I right?), but there can be no predicted values unless you specify the values of the 7 CLASS variables, which hasn't been done (or even mentioned). Could you explain more about what you want?
Also, the OP says:
Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.
I am glad to hear this, I will mark this down as another case of ChatGPT not knowing enough to give a correct answer, but it gave an answer anyway.
@PaigeMiller You are correct. I am asking for predicted values of the dependent variable at values of the independent variable that I specify. Apologies for the confusion.
Do I need to set reference categories for the variables in the CLASS statement? What about the continuous covariates?
Thanks.
Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.
1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.
estimate 'diff@15-5' joint_years -10;
2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.
data mdat;
do joint_years=15,5;
output;
end;
run;
%margins(data=finallabel_6,
class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds,
response=mean_sbp,
model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6,
margins=joint_years, margindata=mdat, options=diff)
@StatDave I'm exploring the macro approach, but I'm fairly naive to SAS macros. I read the support article, and downloaded the Margins macro definition. When I run it with the data step you provided, I get the Predictive Margins and the Differences of Margins. Great.
Just to confirm, if I wanted to compare the 95% percentile with the 50th percentile, for example, I would just replace the 5 and 15 in the DATA step with the values of the the 95% percentile with the 50th percentile, correct?
That is correct. You can put in any 2 or more values in the joint_years variable in the MDAT data set and the Margins macro will give all pairwise comparisons among those values. But as I noted earlier, predictive margins are computed differently from individual predicted values which change depending on the values of the other variables in the model. The predictive margin is the average of the predicted values from all of the observations with joint_years fixed at a given value.
<The predictive margin is the average of the predicted values from all of the observations with joint_years fixed at a given value.>
The macro description seems to indicate that for predictors not included in the margins= statement, mean values are used for continuous variables and the reference level is used for categorical variables. And, there are ways to set these the values these predictors explicitly.
When I run the macro w/o setting any values, I get output that looks reasonable. I didn't set the reference level for any variables in the CLASS statement, however. According to the documentation: "Individual variable options (such as REF=) are not supported.".
How would the macro know what the reference levels were?
How would I explicitly set the value of continuous predictors, if I wanted to?
Thanks again for all your help.
I don't understand. You just fixed a General Linear Model (only have fixed effect), the estimated coefficient (beta) of "joint_years " stands for one year(unit) change of "joint_years " wold make how much change of Y(mean_sbp ) .
If you want "compare the adjusted mean_sbp at 5 joint_years to 15 joint_years." , just multiple 10 with beta is what you are looking for .
For example:
proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;
estimate 'diff@15-5' ageatstart 10; /*<--suggested by @StatDave*/
run;
the beta of "AgeAtStart" is 0.2346 that stands for one year change of AgeAtStart would make 0.2346 change of "weight" . You just 10*0.2346 = 2.346 is what you are looking for (the same result with StatDave suggested)?
@Ksharp Thanks. I have that syntax in my original post. It multiplies the beta by the number in the ESTIMATE statement, as you show. This is helpful, but it doesn't give me the adjusted means at 5 and 15 years (I now realize I used the wrong terminology in the original post. I'm looking for predicted values, as opposed to adjusted means).
the following is what you are looking for ?
proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;
estimate 'adjusted mean of Y at year=5' intercept 1 ageatstart 5;
estimate 'adjusted mean of Y at year=15' intercept 1 ageatstart 15;
run;
As StatDave said, make a dataset to contains these predicted Y .
proc glm data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /solution ;
output out=mypreds PREDICTED=p;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.