Solved: Re: Estimate Command for a Three-way interaction

Heejeong · Posted 05-16-2022 10:59 PM

Hello,

I am using PROC GLM to examine a three-way interaction between: 1) Number of daily stressors (SAS var name= cAnyStressWide_sum ), 2) Emotional reactivity (SAS var name= cPAreactpost), and 3) Gender (SAS var name= ClinicSex) to predict levels of Systolic blood pressure (SAS var name= SystolicSittingMeanLastTwo). I also graphed it using effectplot slicefitas shown below.

data JH.Final2; set JH.FINAL2;format _all_;run;
proc glm data=JH.Final2 ;
model SystolicSittingMeanLastTwo = cage marriedmidus work race_orig cedu cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD cNeuroticism cpa_mlm2 cTotalChronBio cAnyStressWide_sum|cPAreactpost|ClinicSex
;run;
; store graph1; run;

proc plm restore=graph1 noinfo;
effectplot slicefit(x=cAnyStressWide_sum sliceby=cPAreactpost = -0.0595500, 0, 0.0595500 plotby=ClinicSex = 1 2) / clm;
run;

When I run the above syntax, SAS outputs the below figure:

I want to test whether each of the "blue," "red,", and "green" cPAreactpost lines are significantly different from 0.

Using STATA, results indicated that for ClinicSex 1, none of the lines were significant but for ClinicSex 2, the green line (0.05955 line) is significantly different from 0. However, when I tried replicating the command in SAS using the syntax below, the results indicate that none of the lines are significantly different from 0.

data JH.Final2; set JH.FINAL2;format _all_;run;
proc glm data=JH.Final2 ;
model SystolicSittingMeanLastTwo = cage marriedmidus work race_orig  cedu cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD cNeuroticism cpa_mlm2 cTotalChronBio  cAnyStressWide_sum|cPAreactpost|ClinicSex
;
estimate 'stressdays slope, PA reactivity=Quartile 1 (least reactive -SD) male' cAnyStressWide_sum 1 cPAreactpost -0.0595500 ClinicSex 1 cAnyStressWide_sum*cPAreactpost -0.0595500  ClinicSex*cAnyStressWide_sum 1 ClinicSex*cPAreactpost -0.0595500 ClinicSex*cPAreactpost*cAnyStressWide_sum -0.0595500;
estimate 'stressdays slope, PA reactivity=Quartile 1 (least reactive -SD) female'  cAnyStressWide_sum 1 cPAreactpost -0.0595500 ClinicSex 2 cAnyStressWide_sum*cPAreactpost -0.0595500  ClinicSex*cAnyStressWide_sum 2 ClinicSex*cPAreactpost -0.1191 ClinicSex*cPAreactpost*cAnyStressWide_sum -0.1191;

estimate 'stressdays slope, PA reactivity=Quartile 1 (most reactive MEAN) male'  cAnyStressWide_sum 1 cPAreactpost 0 ClinicSex 1 cAnyStressWide_sum*cPAreactpost 0  ClinicSex*cAnyStressWide_sum 1 ClinicSex*cPAreactpost 0 ClinicSex*cPAreactpost*cAnyStressWide_sum 0;
estimate 'stressdays slope, PA reactivity=Quartile 1 (most reactive MEAN) female' cAnyStressWide_sum 1 cPAreactpost 0 ClinicSex 2 cAnyStressWide_sum*cPAreactpost 0  ClinicSex*cAnyStressWide_sum 2 ClinicSex*cPAreactpost 0 ClinicSex*cPAreactpost*cAnyStressWide_sum 0;

estimate 'stressdays slope, PA reactivity=Quartile 1 (most reactive +SD) male'  cAnyStressWide_sum 1 cPAreactpost 0.0595500 ClinicSex 1 cAnyStressWide_sum*cPAreactpost 0.0595500  ClinicSex*cAnyStressWide_sum 1 ClinicSex*cPAreactpost 0.0595500 ClinicSex*cPAreactpost*cAnyStressWide_sum 0.0595500;
estimate 'stressdays slope, PA reactivity=Quartile 1 (most reactive +SD) female'  cAnyStressWide_sum 1 cPAreactpost 0.0595500 ClinicSex 2 cAnyStressWide_sum*cPAreactpost 0.0595500  ClinicSex*cAnyStressWide_sum 2 ClinicSex*cPAreactpost 0.1191 ClinicSex*cPAreactpost*cAnyStressWide_sum 0.1191/ e;
run;

My main question in this post is whether I should be editing the estimate command above and whether it might be incorrect syntax that's giving me a different result than STATA output (in case it might be helpful, I've copy-pasted the STATA syntax below:

***STATA SYNTAX****
margins, dydx(AnyStressWide_sum) at(PAreactpostc= (-.0596702 0 .0596702) Sex=(1 2))

Thank you so much in advance for any advice you might have and please let me know if you have any questions!

All the best,

Joanna

StatDave · Posted 05-17-2022 06:43 PM

I'm not very familiar with Stata and its margins statement, but from the statement you show it seems that it is estimating the marginal effect of AnyStressWide_sum at each of six settings of PAreactpot and Sex. It appears to use the derivative-based computation of the marginal effect of the continuous variable. This computation of the marginal effect gives you the instantaneous change in the response mean at the indicated settings. This computation of the marginal effect is discussed and illustrated in this note. So, to do that you would call the Margins macro like this:

proc means data=JH.Final3;
   var cPAreactpost; 
   output out=out mean=mean std=sd;
   run;
data adat;
   set out;
   keep cPAreactpost ClinicSex;
   do cPAreactpost = mean-sd, mean, mean+sd;
     ClinicSex=1; output;
     ClinicSex=2; output;
   end;
   run;
%Margins(data=JH.Final3, response=SystolicSittingMeanLastTwo, class=ClinicSex, 
         model=SystolicSittingMeanLastTwo cage marriedmidus work race_orig cedu 
         cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD 
         cNeuroticism cpa_mlm2 cTotalChronBio 
         AnyStressWide_sum7|cPAreactpost|ClinicSex,
         effect=AnyStressWide_sum7, at=PAreactpot ClinicSex, atdata=adat, 
         options=cl)

Regarding the Warning messages, it appears that PROC GENMOD (which the macro uses to fit the model) is unable to properly converge. It has nothing to do with your predictor having negative values. Note that GENMOD fits the model using an iterative maximum likelihood algorithm while GLM uses least squares, which doesn't require iteration. Convergence problems are always possible with such algorithms. You might want to experiment with fitting the model directly in PROC GENMOD. You might have to make some alteration to the model or tweak the algorithm to get proper convergence. Once you know how to get GENMOD to converge, then specify the same in the macro.

View solution in original post

StatDave · Posted 05-16-2022 11:17 PM

The ESTIMATE statement does not compute predictive margins or marginal effects. It computes estimates of linear combinations of the model parameters with all model predictors fixed at the specified values (add the E option to see the full set of coefficients used in the linear combination). If you want to compute margins, use the Margins macro.

Heejeong · Posted 05-17-2022 03:43 PM

Hello, @StatDave,

Thank you SO much for your continued guidance (starting with your extremely helpful comments from my last post) as I learn more about the MARGINS and ESTIMATE statements in SAS. I wanted to ensure that I had a clear understanding of the two STATEMENTS so wanted to follow up with a few questions. I know you are extremely busy, so I greatly appreciate responses/thoughts on any of these questions!

Below are some thoughts and questions:

1. Would it be correct to say that both MARGINS & ESTIMATE statements are predicting the values of the DV when the predictor variables are set at specific values?

2. But the main difference is the way MARGINS & ESTIMATES treat the covariates. The MARGINS statement uses the actual observed values whereas the ESTIMATE statement sets the covariate values a 0. If this is true, then what is the benefit of using the ESTIMATE statements? Also, if the difference in these two statements are "how the covariates are calculated," then would the results of MARGINS and ESTIMATE commands be identical if a model does not have any covariates?

3. I conceptually understand what a linear prediction is but I'm having a bit of trouble understanding the difference between "predictive margins or marginal effects" and "linear combinations of the model parameters." Since this is the main conceptual difference between the MARGINS and ESTIMATE statements, I wanted to make sure that I understand the difference clearly. Would the ESTIMATE statements only be feasible when running models with linear associations between the IVs and the DV? I think I'm having a hard time understanding when to use the MARGINAL vs. the ESTIMATE statements.

Thank you so much again for all your help!

Best,

Joanna

Heejeong · Posted 05-17-2022 05:03 PM

Based on your great suggestion, @StatDave, I wrote the below MACRO statement but I'm getting the below error messages. When I tried to look this up, I learned that I would see the "The negative of the Hessian is not positive definite" error if any of my covariates or predictors values have negative values. But for one of my predictor variables (cPAreactpost),it will have to contain negative values regardless of how I code the variable.

If there is any tip/advice that could be shared with me, I would greatly appreciate it! Thank you so much!

proc means data=JH.Final3;
var cPAreactpost; 
output out=out mean=mean std=sd;
run;
data mdat;
set out;
keep cPAreactpost;
do cPAreactpost = mean-sd, mean, mean+sd;
output;
end;
run;
data adat;
ClinicSex=1;
AnyStressWide_sum7=1;
run;
%Margins(data=JH.Final3, response=SystolicSittingMeanLastTwo, class=ClinicSex, 
model=SystolicSittingMeanLastTwo cage marriedmidus work race_orig  cedu cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD cNeuroticism cpa_mlm2 cTotalChronBio  AnyStressWide_sum7|cPAreactpost|ClinicSex,
margins=cPAreactpost, margindata=mdat, 
at=AnyStressWide_sum7 ClinicSex, atdata=adat, options=diff reverse cl)

StatDave · Posted 05-17-2022 06:43 PM

I'm not very familiar with Stata and its margins statement, but from the statement you show it seems that it is estimating the marginal effect of AnyStressWide_sum at each of six settings of PAreactpot and Sex. It appears to use the derivative-based computation of the marginal effect of the continuous variable. This computation of the marginal effect gives you the instantaneous change in the response mean at the indicated settings. This computation of the marginal effect is discussed and illustrated in this note. So, to do that you would call the Margins macro like this:

proc means data=JH.Final3;
   var cPAreactpost; 
   output out=out mean=mean std=sd;
   run;
data adat;
   set out;
   keep cPAreactpost ClinicSex;
   do cPAreactpost = mean-sd, mean, mean+sd;
     ClinicSex=1; output;
     ClinicSex=2; output;
   end;
   run;
%Margins(data=JH.Final3, response=SystolicSittingMeanLastTwo, class=ClinicSex, 
         model=SystolicSittingMeanLastTwo cage marriedmidus work race_orig cedu 
         cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD 
         cNeuroticism cpa_mlm2 cTotalChronBio 
         AnyStressWide_sum7|cPAreactpost|ClinicSex,
         effect=AnyStressWide_sum7, at=PAreactpot ClinicSex, atdata=adat, 
         options=cl)

Regarding the Warning messages, it appears that PROC GENMOD (which the macro uses to fit the model) is unable to properly converge. It has nothing to do with your predictor having negative values. Note that GENMOD fits the model using an iterative maximum likelihood algorithm while GLM uses least squares, which doesn't require iteration. Convergence problems are always possible with such algorithms. You might want to experiment with fitting the model directly in PROC GENMOD. You might have to make some alteration to the model or tweak the algorithm to get proper convergence. Once you know how to get GENMOD to converge, then specify the same in the macro.

Heejeong · Posted 05-18-2022 12:19 PM

Thank you so much for this helpful response, @StatDave!

First, the distribution explanation (maximum likelihood algorithm vs least-squares) clarifies the error message, I will try to troubleshoot by just using a GENMOD syntax! Could I potentially think that the convergence problem could be the reason why my output produces marginal effects that are almost all 0s?

Secondly, I read through the helpful documentation you sent me and as I was making sure that I understood the approach perfectly, I had 2 quick questions. Below, I've written out the syntax from the website and the syntax you wrote for me:

*FROM THE WEBSITE:
 %Margins(data=Remiss,
          response=remiss, 
          roptions=event='1',
          model=blast smear,
          dist= binomial, 
          effect=blast,
          at= blast smear,
          atwhere=blast=1.1 and smear=.83,
          options=cl)

*FROM DAVE:
proc means data=JH.Final3;
var cPAreactpost; 
output out=out mean=mean std=sd;
run;
data adat;
set out;
keep cPAreactpost ClinicSex;
do cPAreactpost = mean-sd, mean, mean+sd;
ClinicSex=1; output;
ClinicSex=2; output;
end;
run;

%Margins(data=JH.Final3, 
response=SystolicSittingMeanLastTwo, 
class=ClinicSex, 
model=SystolicSittingMeanLastTwo cage marriedmidus work race_orig cedu cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD 
cNeuroticism cpa_mlm2 cTotalChronBio AnyStressWide_sum7|cPAreactpost|ClinicSex,
effect=AnyStressWide_sum7, 
at=PAreactpot ClinicSex, 
atdata=adat, 
options=cl)

1) In our syntax, do we not have the “roptions=event='1'',” and “dist= binomial” syntax because our DV is a continuous variable and does not have a binomial distribution?

2) Also, the way the website wrote out the model statement is a bit different than ours. The website only wrote out the predictors (e.g., “model=blast smear”), whereas we included the DV (e.g., model=SystolicSittingMeanLastTwo cage marriedmidus work race_orig cedu cHHtotalIncome_total EverSmokeReg Exercise20mins CVDmeds cBMI cCESD cNeuroticism cpa_mlm2 cTotalChronBio AnyStressWide_sum7|cPAreactpost|ClinicSex”). Would it be safe for me to assume that both approaches are correct?

3) The website uses specific values of each predictor variable (atwhere=blast=1.1 and smear=.83) to run the macro statement but you helped me create another dataset that contains more than one specified value of the two predictor variables (data adat; set out; keep cPAreactpost ClinicSex; do cPAreactpost = mean-sd, mean, mean+sd; ClinicSex=1; output;ClinicSex=2; output; end;).Would it be correct for me to think that your approach is better because it allows me to use more than 1 specified value at which we are estimating the marginal effects?

Thank you so much in advance for your additional help, I am learning so much and so grateful for this opportunity!

StatDave · Posted 05-18-2022 12:52 PM

I didn't notice that you had the response variable in model=. That is incorrect and is likely why GENMOD had the problem fitting the model since doing this uses the response as a predictor which obviously is wrong. So, remove the response variable from model=. Yes, those roptions= and dist= options in the note I referred to are for the binary response in that example and not appropriate in your case. If the specific values at which you want to compute the marginal effects actually occur in your input data (the data= data set), then you can use atwhere=. I assume that isn't the case for your values computed from the mean and standard deviation. The approach I show using atdata= can always be used.

Heejeong · Posted 05-18-2022 10:50 PM

Thank you so much, @StatDave!

This solved the problem and I was able to replicate my findings from STATA!

I am now trying to graph the margins using PROC SGPLOT and using the syntax below. I am using the dataset "_meffect" which I thought was the final dataset outputted from the MACRO statement but I keep getting an error saying that the stress variable "Stress7" is not found? Also, the predictor variable names are different because I shortened the names due to an error message. Thank you for any tips you might have on this!

proc sgplot data=_meffect noautolegend;
 SERIES X = ESTIMATE Y = Stress7;
 SERIES X = ESTIMATE Y = cPAreact;
 SERIES X = ESTIMATE Y = Sex;
title "Effects of Stressful Days, Positive Affective Responsivity, and  Gender on Systolic Blood Pressure";
run;

StatDave · Posted 05-19-2022 10:32 AM

The macro call you used, as I mentioned earlier, produced six marginal effects of the effect= variable. Those marginal effects estimate the instantaneous rate of change in the response mean at those specific settings of the at= variables. So, the output data set (_meffect) contains the marginal effect estimates and the settings of the at= variables. A suitable plot of the marginal effects would the marginal effect value against one of the at= variables grouped by (or separately plotted by) the other at= variable. For example,

proc sgpanel data=_meffect;
  panelby ClinicSex;
  band upper=upper lower=lower x=PAreactpot  / nofill;
  scatter y=_meff x=PAreactpot;
  run;

Heejeong · Posted 05-21-2022 03:48 PM

Thank you so much! I was able to graph it with your response. I just wanted to post the exact syntax and the graph in case it might be helpful for anyone else! Thank you so much again for all your help, @StatDave

proc sort data=_meffect; 
by cPAreact;
run;
proc sgpanel data=_meffect;
  panelby Sex;
  band upper=upper lower=lower x=cPAreact  / fill;
  scatter y=_meff x=cPAreact;
  run;

Heejeong · Posted 07-27-2022 06:41 PM

Thank you so much for all your help with my questions regarding this three-way interaction analysis, @StatDave.

I was wondering if there was any way to change the syntax to make my final graph look at the graph I made in STATA below?

Below is the graph that I made in SAS and I understand that analysis creates these six marginal effects values automatically but I really want to see how the DV (systolic blood pressure) changes across 1+SD, Mean, 1-SD and across each stressful day (minimum 1 & max 7).

If there is any way that I could do this, I would greatly appreciate your help.

Best,

Joanna

StatDave · Posted 07-28-2022 12:08 PM

Then you probably need to have more than just the 3 distinct values on each horizontal axis (1-sd, mean, 1+sd). You could do that by adding more values in the DO statement in the DATA ADAT; step that I showed in my earlier reply.

The 2025 SAS Hackathon has begun!