Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: CONTRAST v. ESTIMATE statements - multivariable linear regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-04-2023 12:02 PM
(2259 views)

I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous. I am using proc glm for the regression. I want to be able to easily compare the adjusted mean_sbp at 5 joint_years to 15 joint_years. Or, compare the mean_sbp of the median of joint_years to the 99th percentile of joint_years.

Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.

I realize I could create categorical versions of joint_years and use it as a categorical variable, but I'd like something a little more flexible.

Here are some approaches I have tried:

```
proc mixed data=work.finallabel_6;
class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds;
model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6;
estimate 'Mean at joint_years=5' joint_years 1 / atmean joint_years=5;
estimate 'Mean at joint_years=15' joint_years 1 / atmean joint_years=15;
format
race1c race1c.
gender1 gender1f.
income6 income6fmt.
educ1 educ1fmt.
htn_meds
lipid_meds
dm_meds yes_no_fmt.;
run;
```

```
proc glm data=mesa.finallabel_6;
class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds;
model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6 / solution clparm;
/* estimate 'Difference at joint_years=0 vs joint_years=23' joint_years 1 -1; */
/* Use the estimate statement to compare the adjusted mean SBP at any 2 values of joint_years; */
/* this essentially multiplies the beta by 10 */
/* estimate joint_years 10; */
/* contrast 'Effect of increasing joint_years from 5 to 15' joint_years 10; */
lsmeans joint_years / at = (joint_years = 5) (joint_years = 15) adjust=e;
format
race1c race1c.
gender1 gender1f.
income6 income6fmt.
educ1 educ1fmt.
htn_meds
lipid_meds
dm_meds yes_no_fmt.
;
run;
```

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.

1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.

```
estimate 'diff@15-5' joint_years -10;
```

2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.

```
data mdat;
do joint_years=15,5;
output;
end;
run;
%margins(data=finallabel_6,
class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds,
response=mean_sbp,
model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6,
margins=joint_years, margindata=mdat, options=diff)
```

14 REPLIES 14

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

**Never use ESTIMATE or CONTRAST statements when you can also achieve the wished for analyses by using LSMEANS, SLICE, or LSMESTIMATE statements.**

If ESTIMATE or CONTRAST statements cannot be avoided, take care because they are sometimes difficult to specify (program) correctly. Below are some blogs and usage notes on correct usage!

**The magical ESTIMATE (and CONTRAST) statements**

By Chris Daman on SAS Learning Post April 23, 2012

https://blogs.sas.com/content/sastraining/2012/04/23/the-magical-estimate-and-contrast-statements/

**"Easy button" for ESTIMATE statements**

By Chris Daman on SAS Learning Post April 25, 2012

https://blogs.sas.com/content/sastraining/2012/04/25/easy-button-for-estimate-statements/

**ESTIMATE Statements - the final installment**

By Chris Daman on SAS Learning Post May 2, 2012

https://blogs.sas.com/content/sastraining/2012/05/02/estimate-statements-the-final-installment/

**How to write CONTRAST and ESTIMATE statements in SAS regression procedures?**

By Rick Wicklin on The DO Loop June 6, 2016

**Usage Note 24447: Examples of writing CONTRAST and ESTIMATE statements**

https://support.sas.com/kb/24/447.html

**Usage Note 67024: Using the ESTIMATE or CONTRAST statement or Margins macro to assess continuous variable effects in interactions and splines**

When a model contains interactions, it is often of interest to assess the effect of one of the interacting variables. When the variable of interest is categorical, and therefore is specified in the CLASS statement, **this is most easily done using the LSMEANS, SLICE, or LSMESTIMATE statement.** But when the variable of interest is continuous, these statements cannot be used. Two procedures, LOGISTIC and PHREG, provide statements that can estimate the effect of increasing a continuous predictor by a specified number of units. However, when the modeled response is not binomial or a time to event, these procedures are not appropriate. Nevertheless, the HAZARDRATIO statement in PROC PHREG can still be used to provide contrast coefficients that can be used in CONTRAST or ESTIMATE statements to test or estimate the effect of a continuous predictor. This can be done for a continuous predictor involved in one or more interactions and even in constructed effects such as splines. In the case of generalized linear models that don’t use the identity link, it is important to note that the estimated effect using these coefficients is not a difference in response means. For these models, an alternative and generally easier solution is provided by the Margins macro, which can estimate the required difference in means. Both of these approaches are discussed and illustrated in this note.

Koen

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the response and the helpful links.

Could you advise on how to use the LSMEANS, SLICE, or LSMESTIMATE statements to achieve the desired result?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

Here are some example LSMEANS statements.

I think you can find inspiration in the 3 last ones (with **at** and **slice**).

PROC MIXED : lsmeans adhesive / diff cl alpha=0.10;

PROC MIXED : lsmeans pretrt stain / diff adjust=tukey adjdfe=row;

PROC MIXED : lsmeans drug / diff=control('p') adjust=dunnett adjdfe=row;

PROC MIXED : lsmeans Trt / e diff;

PROC MIXED : lsmeans gender*age / slice=gender slice=age;

PROC GLIMMIX : lsmeans trt / oddsratio ilink diff;

PROC MIXED : lsmeans temp / at thick=1850 diff;

PROC MIXED : lsmeans Trt / at flush0=50 diff;

PROC MIXED : lsmeans drug*hour / slice=hour diff adjust=tukey adjdfe=row;

Koen

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't think any of these are what the OP @_maldini_ wants, they all involve a class variable, and his original question was comparing the predicted values at two different values of a continuous variable, no class variable involved.

However, I also don't think the question as stated by the OP makes any sense (yet) so I don't have an answer.

I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous.

I think instead of "adjusted means" the OP really means "predicted values" (am I right?), but there can be no predicted values unless you specify the values of the 7 CLASS variables, which hasn't been done (or even mentioned). Could you explain more about what you want?

Also, the OP says:

Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.

I am glad to hear this, I will mark this down as another case of ChatGPT not knowing enough to give a correct answer, but it gave an answer anyway.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PaigeMiller You are correct. I am asking for predicted values of the dependent variable at values of the independent variable that I specify. Apologies for the confusion.

Do I need to set reference categories for the variables in the CLASS statement? What about the continuous covariates?

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.

1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.

```
estimate 'diff@15-5' joint_years -10;
```

2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.

```
data mdat;
do joint_years=15,5;
output;
end;
run;
%margins(data=finallabel_6,
class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds,
response=mean_sbp,
model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6,
margins=joint_years, margindata=mdat, options=diff)
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@StatDave I'm exploring the macro approach, but I'm fairly naive to SAS macros. I read the support article, and downloaded the Margins macro definition. When I run it with the data step you provided, I get the Predictive Margins and the Differences of Margins. Great.

Just to confirm, if I wanted to compare the 95% percentile with the 50th percentile, for example, I would just replace the 5 and 15 in the DATA step with the values of the the 95% percentile with the 50th percentile, correct?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

<The predictive margin is the average of the predicted values from all of the observations with joint_years fixed at a given value.>

The macro description seems to indicate that for predictors not included in the margins= statement, mean values are used for continuous variables and the reference level is used for categorical variables. And, there are ways to set these the values these predictors explicitly.

When I run the macro w/o setting any values, I get output that looks reasonable. I didn't set the reference level for any variables in the CLASS statement, however. According to the documentation: "Individual variable options (such as REF=) are not supported.".

How would the macro know what the reference levels were?

How would I explicitly set the value of continuous predictors, if I wanted to?

Thanks again for all your help.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The line you quoted is correct - read through the Details section of the Margins macro documentation for more info. Predictive margins do not generally set the values of all of the predictors other than specified in margins=. If you want all of the other predictors fixed, then you want predicted means, not predictive margins. You can get predicted means by simply using the PRED= option in the OUTPUT statement as I showed in my earlier reply. That will give you a data set with predicted means for all of your observations. If you want predicted means for other predictor settings than you have in your input data set, then create a data set with the desired predictor settings and then use the SCORE statement. See the example titled "Scoring data sets" in the LOGISTIC documentation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't understand. You just fixed a General Linear Model (only have fixed effect), the estimated coefficient (beta) of "joint_years " stands for one year(unit) change of "joint_years " wold make how much change of Y(mean_sbp ) .

If you want "compare the adjusted mean_sbp at 5 joint_years to 15 joint_years." , just multiple 10 with beta is what you are looking for .

For example:

```
proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;
estimate 'diff@15-5' ageatstart 10; /*<--suggested by @StatDave*/
run;
```

the beta of "AgeAtStart" is 0.2346 that stands for one year change of AgeAtStart would make 0.2346 change of "weight" . You just 10*0.2346 = 2.346 is what you are looking for (the same result with StatDave suggested)?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you just want the difference in predicted means at 5 and 15, then you just need the ESTIMATE statement that I showed in my response earlier. If you want the predicted means, separately, at 5 and 15, then the easiest way (assuming that your data has observations observed at 5 and 15) is to add an OUTPUT statement in your PROC GLM step requesting predicted values:

output out=mypreds pred=p;

That will give you a data set MYPREDS with predicted means (in variable P) for each observation. See the ones where joint_years=5 or 15. Note that the predicted mean depends on the setting of all of the model variables, so they will vary.

output out=mypreds pred=p;

That will give you a data set MYPREDS with predicted means (in variable P) for each observation. See the ones where joint_years=5 or 15. Note that the predicted mean depends on the setting of all of the model variables, so they will vary.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

the following is what you are looking for ?

```
proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;
estimate 'adjusted mean of Y at year=5' intercept 1 ageatstart 5;
estimate 'adjusted mean of Y at year=15' intercept 1 ageatstart 15;
run;
```

As StatDave said, make a dataset to contains these predicted Y .

```
proc glm data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /solution ;
output out=mypreds PREDICTED=p;
quit;
```

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.