BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
carolinabeef
Fluorite | Level 6

Hello!

 

I'm having a few problems trying to get effects for a three-way interaction in the "Predictive margins and average marginal effects" (%margins) macro on SAS 9.4 using Windows 10: https://support.sas.com/kb/63/038.html

 

Right now, I'm attempting to run a model focusing on four variables of interest (1 DV, 3 variables involved in a 3-way interaction), with the goal of examining effects of t7dmar_c on Mar_8 at different levels of both t7c_can_neg AND t7c_at_EC

Mar_8 is the outcome variable and has a negative binomial distribution

Mar_7 is a continuous predictor variable

t7dmar_c is a continuous predictor variable involved in 3-way interaction

t7c_can_neg is a continuous predictor variable involved in 3-way interaction

t7c_at_EC is a continuous predictor variable involved in 3-way interaction

All other variables are simply covariates held at their means 

 

Here is my sas code for the macro:

libname ques "Enter your location"; run;

 %inc "your location\margins.txt";

/*TESTING 3 WAY IXN*/
	 options mlogic mprint symbolgen;
		  %Margins(data     = ques.atmeansdata,
         class    = t7cgender t7marorder_c,
         response = MAR_8,
         dist     = negbin,
         model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
         effect  = t7dmar_c, 
         at       = t7C_can_neg t7C_AT_EC,
		 atwhere = t7c_can_neg in (3.3333333333, 3.2962962963, 2.33333, 1.00000) and t7c_at_ec in (5.5263157895, 3.7368421053, 3.6842105263, 2),
         options  = cl atmeans);
		

Unfortunately I keep getting two errors (see log attached).

 

1. One states that "The ID value "b_t7dmar_c_t7C_can_neg" occurs twice in the input data set." I suspected that this might have to do with the 3-way interaction in the model (as this has not occurred in two-way interaction models). After removing the 3-way interaction term, the error goes away. This leads me to the question: Should I include interaction terms in the model statement? From what I can tell, including the interaction terms DOES affect the effects of Mar_7 on Mar_8, so it concerns me that the model won't run with the 3-way interaction term. Any information and thoughts on this would be helpful.

 

2. Another error states that "A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: &nfix=0". I've used this macro with two "atmeans = " variables and not had this happen. As far as I can tell the syntax for the locations of the effects is correct. The locations of the "atmeans = " variables are very specific, as I'm looking for effects of Mar_7 at approximately 10th and 90th percentiles of both variables. I located levels of these variables where there are values present in the data (i.e., someone in the data endorsed a specific combination of these levels of these variables), so the problem doesn't appear to be that there is no data for which to compute an effect at these combinations. Why might I be getting this error for this model?

 

Log and sample data are attached.

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The "occurs twice" error happens because the parameter names for the interactions become long, particularly for the three-way interaction. The name gets truncated resulting in it being the same as one of the other interactions. This appears to be fixed in the current release, 1.06, which is available at the Margins macro web page

 

To get the marginal effects at the set of level combinations you want, create a data set of those combinations and then specify that in atdata= rather than using atwhere=. Also, the noprintbyat option presents the results more compactly.

 

data atdat; 
do t7c_can_neg = 3.3333333333, 3.2962962963, 2.33333, 1.00000;
do t7c_at_ec = 5.5263157895, 3.7368421053, 3.6842105263, 2; 
 output; 
end; end;
run;
%Margins(data = atmeansdata,
 class    = t7cgender t7marorder_c,
 response = MAR_8,
 dist     = negbin,
 model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
 effect   = t7dmar_c, 
 at       = t7C_can_neg t7C_AT_EC,
 atdata   = atdat,
 options  = cl atmeans noprintbyat)

View solution in original post

7 REPLIES 7
SteveDenham
Jade | Level 19

Here is something to test my idea that the macro flatly refuses to handle three way interactions. Try fitting all 2 way interactions using something like this:

 

t7dmar_c|t7c_can_neg|t7c_at_EC @2.  

 

I don't know that this will help at all, as it looks like your current MODEL statement includes all of the two way interactions.  My experience with 3 way interactions where all of the variables are continuous is pretty slim, but I worry that it may be fitting noise. I was going to suggest perhaps categorizing one of the three variables, but that doesn't really solve the issue, as you still would have a 3 way interaction in the model.  I hope someone else can help here.

 

SteveDenham

 

 

StatDave
SAS Super FREQ

The "occurs twice" error happens because the parameter names for the interactions become long, particularly for the three-way interaction. The name gets truncated resulting in it being the same as one of the other interactions. This appears to be fixed in the current release, 1.06, which is available at the Margins macro web page

 

To get the marginal effects at the set of level combinations you want, create a data set of those combinations and then specify that in atdata= rather than using atwhere=. Also, the noprintbyat option presents the results more compactly.

 

data atdat; 
do t7c_can_neg = 3.3333333333, 3.2962962963, 2.33333, 1.00000;
do t7c_at_ec = 5.5263157895, 3.7368421053, 3.6842105263, 2; 
 output; 
end; end;
run;
%Margins(data = atmeansdata,
 class    = t7cgender t7marorder_c,
 response = MAR_8,
 dist     = negbin,
 model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
 effect   = t7dmar_c, 
 at       = t7C_can_neg t7C_AT_EC,
 atdata   = atdat,
 options  = cl atmeans noprintbyat)
carolinabeef
Fluorite | Level 6

Thank you! This worked for getting the marginal effects at these points. This makes me wonder one more related question that perhaps you could help with. Is there a way to get marginal effects condensed across multiple values within the data? So, for example, here I've selected the exact data points for two variables to get the marginal effect of t7dmar_c on Mar8 (e.g., at 3.33 of one, 5.52 of the other). From what I can tell, this provides the effects for maybe one person in the data. Is there a way to get a single average marginal effect for, say, everyone who scored between a 3 and 4 on one, and 5 and 6 on the other?

 

Thanks again!

StatDave
SAS Super FREQ

I assume you want to essentially compute the marginal effect for all of the observed combinations of t7C_can_neg and t7C_AT_EC where t7C_can_neg is (say) between 3 and 4 and t7C_AT_EC is (say) between 5 and 6. This can be done by specifying that condition in atwhere=

 

%Margins(data = atmeansdata,
 class    = t7cgender t7marorder_c,
 response = MAR_8,
 dist     = negbin,
 model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  
            MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
 effect   = t7dmar_c, 
 at       = t7C_can_neg t7C_AT_EC,
 atwhere  = 3<t7c_can_neg<4 and 5<t7c_at_ec<6,
 options  = cl atmeans noprintbyat)

The result of this is a set of 19 marginal effect estimates. If you then want the average of those, you can obtain that by specifying a contrast vector, f, that multiplies the vector of marginal effect estimates. To get the average, the f vector is just a set of 19 values all equal to 1/19. The following DATA step creates the appropriate data set, C, for use in contrasts=. Note the data set has just two character variables, LABEL which can be anything you want to label the result, and F which is the f vector. If you repeat the above Margins call adding contrasts=c, the average of the marginal effects if provided.  

 

data c;
 length label f $32767; 
 label="avg ME";
 array c c1-c19;
 do i=1 to 19; c(i)=1/19; end;
 f=catx(' ',of c1-c19);
 keep label f;
 run;
%Margins(data = atmeansdata,
 class    = t7cgender t7marorder_c,
 response = MAR_8,
 dist     = negbin,
 model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  
            MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
 effect   = t7dmar_c, 
 at       = t7C_can_neg t7C_AT_EC,
 atwhere  = 3<t7c_can_neg<4 and 5<t7c_at_ec<6,
 contrasts= c,
 options  = cl atmeans noprintbyat)
carolinabeef
Fluorite | Level 6

Wonderful--thank you! 

 

Perhaps this should begin a new thread, although it's a related set of questions coming from this output:

 

1. I've noticed in the effects I got from follow your advice that the standard errors are very odd. This makes me question how these standard errors are computed, and whether they are reliable. Are the SEs for these marginal effects trustworthy for these continuous interaction models? Or should I simply be looking at the magnitude of these effects?

 

2. I've also noticed that the output does not provide intercepts. Is there any option to request intercepts for these models?

 

Thank you again for your help!

SteveDenham
Jade | Level 19

I can address number 2.  The output from the macro provides the marginal estimates on the original scale, based on the Results tab here , i.e. as a mean count number.  It isn't the parameter estimate that you would see in a solution vector, therefore there is no intercept to report. If you need an intercept, I think the place to get it would be from the PROC GENMOD ODS table ParameterEstimates. Note that it will be on the log scale.

 

I will guess at number 1.  If the variables involved are not centered, then the value of the continuous by continuous by continuous interaction is the product of the three covariates.  So, if each of the continuous variables were on a range of x1=(0, 10), x2=(20, 50), x3=(100, 1000), the 3 way interaction is defined on (0, 500000).  Two things will happen: the observations at the high end of the range are going to have a lot of leverage in the fit.  High leverage points lead to inflated standard errors. Your slopes will be greatly influenced by these points as well.

I would recommend recentering the variables (and not rescaling, so that the regression coefficients make more sense) so that the ranges looked like x1=(-5, 5), x2=(-15, 15) and x3=(-450, 450).  Now the interaction is defined on (-33750, 33750).  I suspect that this would reduce the standard errors by an order of magnitude.

 

SteveDenham

 

StatDave
SAS Super FREQ

As mentioned in the Details section in the Margins macro documentation, the standard errors are computed using the delta method. I don't see anything odd about the standard error estimates and believe they are correctly computed.

 

All of the computations for predictive margins and/or marginal effects occur following the fitting of the model that you specify in the macro. That model always has an intercept unless you tell it to omit the intercept. The code earlier will show the results of the model fit in PROC GENMOD and it includes an intercept (4.2604). 

 

I should note that there is an alternative approach to estimating the marginal effect in the range of the two variables that you mentioned. This alternative does not estimate the same quantity as I showed earlier using the average of the marginal effects computed using contrasts=. Note that after the specified model is fit, the macro uses the fitted model to compute the marginal effect at each observation in the data set and then averages them. The method shown earlier fixes the two at= variables at each observed combination in the atwhere= range to obtain the marginal effect on each observation and then their average which is shown in the results. An alternative is to compute the marginal effect for all observations without fixing the two variables (although they end up being fixed at their means if you use atmeans) and then to compute a single average marginal effect be averaging only over the observations that are in the specified range. That can be done using within= instead of at=, atwhere=, and contrasts= as shown below. You will have to decide which of these two methods is best for your situation. 

 

%Margins(data = x.atmeansdata,
 class    = t7cgender t7marorder_c,
 response = MAR_8,
 dist     = negbin,
 model    = t7cgender t7cnuage t7dmar_c t7marorder_c t7C_can_neg t7c_can_pos t7C_AT_EC  
            MAR_7 t7dmar_c*t7C_can_neg t7C_AT_EC*t7C_can_neg t7dmar_c*t7C_AT_EC t7dmar_c*t7C_AT_EC*t7C_can_neg,
 effect   = t7dmar_c, 
 within   = 3<t7c_can_neg<4 and 5<t7c_at_ec<6,
 options  = cl atmeans )

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1367 views
  • 6 likes
  • 3 in conversation