Solved: Re: CONTRAST v. ESTIMATE statements - multivariable linear regression

_maldini_ · Posted 12-04-2023 12:02 PM

I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous. I am using proc glm for the regression. I want to be able to easily compare the adjusted mean_sbp at 5 joint_years to 15 joint_years. Or, compare the mean_sbp of the median of joint_years to the 99th percentile of joint_years.

Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.

I realize I could create categorical versions of joint_years and use it as a categorical variable, but I'd like something a little more flexible.

Here are some approaches I have tried:

proc mixed data=work.finallabel_6;
    class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds;
    model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
                   bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6;
    
    estimate 'Mean at joint_years=5' joint_years 1 / atmean joint_years=5;
    estimate 'Mean at joint_years=15' joint_years 1 / atmean joint_years=15;

    format 
    race1c race1c.
    gender1 gender1f.
    income6 income6fmt.
    educ1 educ1fmt.
    htn_meds
    lipid_meds 
    dm_meds yes_no_fmt.;
run;

proc glm data=mesa.finallabel_6;
	class race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds; 
	
    model mean_sbp = joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c 
    bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6 / solution clparm;
    
/*  estimate 'Difference at joint_years=0 vs joint_years=23' joint_years 1 -1; */

/* 	Use the estimate statement to compare the adjusted mean SBP at any 2 values of joint_years;  */
/*  this essentially multiplies the beta by 10 */
/* 	estimate joint_years 10;  */

/* 	contrast 'Effect of increasing joint_years from 5 to 15' joint_years 10; */
	lsmeans joint_years / at = (joint_years = 5) (joint_years = 15) adjust=e;

    format 
	race1c race1c.
	gender1	gender1f.
	income6 income6fmt.
	educ1 educ1fmt.
	htn_meds
	lipid_meds 
	dm_meds yes_no_fmt.
    ;
run;

Thanks for your help.

StatDave · Posted 12-04-2023 06:11 PM

Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.

1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.

estimate 'diff@15-5' joint_years -10;

2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.

data mdat;
   do joint_years=15,5;
     output;
   end;
   run;
%margins(data=finallabel_6, 
         class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds, 
         response=mean_sbp, 
         model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
               bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6, 
         margins=joint_years, margindata=mdat, options=diff)

View solution in original post

sbxkoenk · Posted 12-04-2023 04:46 PM

Hello,

Never use ESTIMATE or CONTRAST statements when you can also achieve the wished for analyses by using LSMEANS, SLICE, or LSMESTIMATE statements.

If ESTIMATE or CONTRAST statements cannot be avoided, take care because they are sometimes difficult to specify (program) correctly. Below are some blogs and usage notes on correct usage!

The magical ESTIMATE (and CONTRAST) statements

By Chris Daman on SAS Learning Post April 23, 2012

https://blogs.sas.com/content/sastraining/2012/04/23/the-magical-estimate-and-contrast-statements/

"Easy button" for ESTIMATE statements

By Chris Daman on SAS Learning Post April 25, 2012

https://blogs.sas.com/content/sastraining/2012/04/25/easy-button-for-estimate-statements/

ESTIMATE Statements - the final installment

By Chris Daman on SAS Learning Post May 2, 2012

https://blogs.sas.com/content/sastraining/2012/05/02/estimate-statements-the-final-installment/

How to write CONTRAST and ESTIMATE statements in SAS regression procedures?

By Rick Wicklin on The DO Loop June 6, 2016

https://blogs.sas.com/content/iml/2016/06/06/write-contrast-estimate-statements-sas-regression-proce...

Usage Note 24447: Examples of writing CONTRAST and ESTIMATE statements

https://support.sas.com/kb/24/447.html

Usage Note 67024: Using the ESTIMATE or CONTRAST statement or Margins macro to assess continuous variable effects in interactions and splines

67024 - Using the ESTIMATE or CONTRAST statement or Margins macro to assess continuous variable effe...

When a model contains interactions, it is often of interest to assess the effect of one of the interacting variables. When the variable of interest is categorical, and therefore is specified in the CLASS statement, this is most easily done using the LSMEANS, SLICE, or LSMESTIMATE statement. But when the variable of interest is continuous, these statements cannot be used. Two procedures, LOGISTIC and PHREG, provide statements that can estimate the effect of increasing a continuous predictor by a specified number of units. However, when the modeled response is not binomial or a time to event, these procedures are not appropriate. Nevertheless, the HAZARDRATIO statement in PROC PHREG can still be used to provide contrast coefficients that can be used in CONTRAST or ESTIMATE statements to test or estimate the effect of a continuous predictor. This can be done for a continuous predictor involved in one or more interactions and even in constructed effects such as splines. In the case of generalized linear models that don’t use the identity link, it is important to note that the estimated effect using these coefficients is not a difference in response means. For these models, an alternative and generally easier solution is provided by the Margins macro, which can estimate the required difference in means. Both of these approaches are discussed and illustrated in this note.

Koen

_maldini_ · Posted 12-04-2023 05:02 PM

Thanks for the response and the helpful links.

Could you advise on how to use the LSMEANS, SLICE, or LSMESTIMATE statements to achieve the desired result?

sbxkoenk · Posted 12-04-2023 05:15 PM

Hello,

Here are some example LSMEANS statements.
I think you can find inspiration in the 3 last ones (with at and slice).

PROC MIXED : lsmeans adhesive / diff cl alpha=0.10;
PROC MIXED : lsmeans pretrt stain / diff adjust=tukey adjdfe=row;
PROC MIXED : lsmeans drug / diff=control('p') adjust=dunnett adjdfe=row;
PROC MIXED : lsmeans Trt / e diff;
PROC MIXED : lsmeans gender*age / slice=gender slice=age;
PROC GLIMMIX : lsmeans trt / oddsratio ilink diff;
PROC MIXED : lsmeans temp / at thick=1850 diff;
PROC MIXED : lsmeans Trt / at flush0=50 diff;
PROC MIXED : lsmeans drug*hour / slice=hour diff adjust=tukey adjdfe=row;

Koen

PaigeMiller · Posted 12-04-2023 05:33 PM

I don't think any of these are what the OP @_maldini_ wants, they all involve a class variable, and his original question was comparing the predicted values at two different values of a continuous variable, no class variable involved.

However, I also don't think the question as stated by the OP makes any sense (yet) so I don't have an answer.

I am running a multivariable linear regression and want to compare adjusted means of the dependent variable (mean_sbp) at given values of the independent variable (joint_years). Both variables are continuous.

I think instead of "adjusted means" the OP really means "predicted values" (am I right?), but there can be no predicted values unless you specify the values of the 7 CLASS variables, which hasn't been done (or even mentioned). Could you explain more about what you want?

Also, the OP says:

Googling and ChatGPT suggest using the CONTRAST v. ESTIMATE statements, but I can't seem to figure it out.

I am glad to hear this, I will mark this down as another case of ChatGPT not knowing enough to give a correct answer, but it gave an answer anyway.

--
Paige Miller

_maldini_ · Posted 12-05-2023 11:56 AM

@PaigeMiller You are correct. I am asking for predicted values of the dependent variable at values of the independent variable that I specify. Apologies for the confusion.

Do I need to set reference categories for the variables in the CLASS statement? What about the continuous covariates?

Thanks.

StatDave · Posted 12-04-2023 06:11 PM

Two ways depending on whether you compute the difference when all variables other than joint_years are fixed or you allow the other variables to use their observed values.

1) As a difference in predicted response means at joint_years=5 and 15 at any fixed values for all the other variables. Use the ESTIMATE or CONTRAST statement in your GLM step to estimate the mean at joint_years=15 minus the mean at joint_years=5. The needed statement is simple since the model only involves main effects.

estimate 'diff@15-5' joint_years -10;

2) As a difference of predictive margins at joint_years=5 and 15 using the Margins macro. The predictive margins are computed as the averages of the predicted means fixed at joint_years=5 and 15 but allowing the other variables to use their actual values rather than restricting them to fixed values.

data mdat;
   do joint_years=15,5;
     output;
   end;
   run;
%margins(data=finallabel_6, 
         class=race1c gender1 income6 educ1 htn_meds lipid_meds dm_meds, 
         response=mean_sbp, 
         model=joint_years race1c gender1 age6c income6 educ1 drinks_wk pkyrs6c pamvcm6c
               bmi6c tchol_hdl_ratio hba1c6 glucose6 htn_meds lipid_meds dm_meds totmed6, 
         margins=joint_years, margindata=mdat, options=diff)

_maldini_ · Posted 12-12-2023 01:04 PM

@StatDave I'm exploring the macro approach, but I'm fairly naive to SAS macros. I read the support article, and downloaded the Margins macro definition. When I run it with the data step you provided, I get the Predictive Margins and the Differences of Margins. Great.

Just to confirm, if I wanted to compare the 95% percentile with the 50th percentile, for example, I would just replace the 5 and 15 in the DATA step with the values of the the 95% percentile with the 50th percentile, correct?

StatDave · Posted 12-12-2023 02:21 PM

That is correct. You can put in any 2 or more values in the joint_years variable in the MDAT data set and the Margins macro will give all pairwise comparisons among those values. But as I noted earlier, predictive margins are computed differently from individual predicted values which change depending on the values of the other variables in the model. The predictive margin is the average of the predicted values from all of the observations with joint_years fixed at a given value.

_maldini_ · Posted 12-13-2023 04:19 PM

The macro description seems to indicate that for predictors not included in the margins= statement, mean values are used for continuous variables and the reference level is used for categorical variables. And, there are ways to set these the values these predictors explicitly.

When I run the macro w/o setting any values, I get output that looks reasonable. I didn't set the reference level for any variables in the CLASS statement, however. According to the documentation: "Individual variable options (such as REF=) are not supported.".

How would the macro know what the reference levels were?

How would I explicitly set the value of continuous predictors, if I wanted to?

Thanks again for all your help.

StatDave · Posted 12-14-2023 08:48 AM

The line you quoted is correct - read through the Details section of the Margins macro documentation for more info. Predictive margins do not generally set the values of all of the predictors other than specified in margins=. If you want all of the other predictors fixed, then you want predicted means, not predictive margins. You can get predicted means by simply using the PRED= option in the OUTPUT statement as I showed in my earlier reply. That will give you a data set with predicted means for all of your observations. If you want predicted means for other predictor settings than you have in your input data set, then create a data set with the desired predictor settings and then use the SCORE statement. See the example titled "Scoring data sets" in the LOGISTIC documentation.

Ksharp · Posted 12-05-2023 01:15 AM

I don't understand. You just fixed a General Linear Model (only have fixed effect), the estimated coefficient (beta) of "joint_years " stands for one year(unit) change of "joint_years " wold make how much change of Y(mean_sbp ) .

If you want "compare the adjusted mean_sbp at 5 joint_years to 15 joint_years." , just multiple 10 with beta is what you are looking for .

For example:

proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;



estimate 'diff@15-5' ageatstart 10;  /*<--suggested by @StatDave*/
run;

the beta of "AgeAtStart" is 0.2346 that stands for one year change of AgeAtStart would make 0.2346 change of "weight" . You just 10*0.2346 = 2.346 is what you are looking for (the same result with StatDave suggested)?

_maldini_ · Posted 12-05-2023 12:02 PM

@Ksharp Thanks. I have that syntax in my original post. It multiplies the beta by the number in the ESTIMATE statement, as you show. This is helpful, but it doesn't give me the adjusted means at 5 and 15 years (I now realize I used the wrong terminology in the original post. I'm looking for predicted values, as opposed to adjusted means).

StatDave · Posted 12-05-2023 12:10 PM

If you just want the difference in predicted means at 5 and 15, then you just need the ESTIMATE statement that I showed in my response earlier. If you want the predicted means, separately, at 5 and 15, then the easiest way (assuming that your data has observations observed at 5 and 15) is to add an OUTPUT statement in your PROC GLM step requesting predicted values:

output out=mypreds pred=p;

That will give you a data set MYPREDS with predicted means (in variable P) for each observation. See the ones where joint_years=5 or 15. Note that the predicted mean depends on the setting of all of the model variables, so they will vary.

Ksharp · Posted 12-05-2023 09:45 PM

the following is what you are looking for ?

proc mixed data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /s ;

estimate 'adjusted mean of Y at year=5' intercept 1 ageatstart 5; 
estimate 'adjusted mean of Y at year=15' intercept 1 ageatstart 15; 
run;

As StatDave said, make a dataset to contains these predicted Y .


proc glm data=sashelp.heart(obs=1000);
class sex bp_status status;
model weight=ageatstart height sex bp_status status /solution ;

output out=mypreds PREDICTED=p;
quit;

Catch up on SAS Innovate 2026