Solved: Marginal effect for binary variables with SURVEYREG?

kain · Posted 11-25-2020 07:32 PM

I am curring proc survereg with industry and year fixed effect, along with firm clustering. I am trying to estimate something such as "one unit increase in (an independent variable) leads to X% increase/decrease in (a dependent variable)", which is related to marginal effects.

A friend of mine is saying that my statement is used for regression, and not marginal effect to estimate the statement above.

Is this true? If this is true, can I estimate the statement above with surveyreg?

StatDave · Posted 11-26-2020 01:09 PM

SURVEYREG assumes that the response is normally distributed and this is most definitely not true for a binary response. For a binary response, you should fit an appropriate model such as a logistic model. If your data is survey data, then use PROC SURVEYLOGISTIC. The proportion change in response probability for a unit increase in one of the predictors can be computed as (Px+1-Px)/Px = (Px+1/Px)-1, where Px is the response probability at some level of a predictor, X, and Px+1 is the probability after a unit increase in X. It can be expressed as a percent change by multiplying by 100. This change in probability over a range is not, strictly speaking, a marginal effect. Marginal effects and the change in probability are discussed in detail in this note. The note discusses how marginal effects can be estimated using the Margins macro (which cannot be used with survey data) and how the change (difference) in probabilities can be estimated using the NLMeans macro. The proportion change can also be computed at fixed values of the other predictors in the model using the NLEST macro. See the section of the above note titled "Estimating the difference in probability at specific points". Using the logistic model fit there, the following NLEST macro call estimates the proportion change in response probability from BLAST=0.5 to 1.5 with SMEAR fixed at 0.63.

%nlest(instore=log, f=(logistic(b_p1+1.5*b_p2+.63*b_p3)/logistic(b_p1+0.5*b_p2+.63*b_p3))-1, label=PropChng .5 to 1.5)

The resulting estimate is 1.5085, or as a percentage, 150.85%, indicating about a 150% increase in the response probability from BLAST=0.5 to 1.5 at SMEAR=0.63. A standard error, test of equality to zero, and confidence limits are also provided. While PROC LOGISTIC was used here, the same can be done with a model fit in PROC SURVEYLOGISTIC. An estimate that would be more like a marginal effect would be the average of the above proportions evaluated for each observation using each observation's particular value of SMEAR. The proportion change in each observation can be obtained using this macro call

%nlest(instore=log, f=(logistic(b_p1+1.5*b_p2+smear*b_p3)/logistic(b_p1+0.5*b_p2+smear*b_p3))-1, 
   score=remiss, outscore=out)

and then averaging the estimated proportions (in variable PRED)

proc means data=out mean; var pred; run;

which yields a similar value, 1.512 or 151.2%. A proper standard error and confidence interval for this average estimate is not available though.

View solution in original post

ballardw · Posted 11-25-2020 08:25 PM

Where does the binary variable in your subject line come in?

Your description sure sounds like a regression. The slope of the regression line is that increase per unit.

But Surveyreg is more for continuous variables like the number of square feet in a house affecting sale price.

kain · Posted 11-25-2020 08:33 PM

Hi ballardw!
Okay so basically my pseudo statement is like this:
proc surveyreg data = data; class year industry; model dependent_var = independent_var year industry /adjsqr solution; run;
Where the dependent variable is the binary variable (0 or 1). That was I want t know if the independent_var increases by 1, the dependent variable increases by what percentage. I thought of using the slope, like exp(slope) - 1 and that would give me the percentage change of the dependent variable given a one-unit increase in the independent_var. Would you mind advising me on which direction I should be looking at please?

StatDave · Posted 11-26-2020 01:09 PM

SURVEYREG assumes that the response is normally distributed and this is most definitely not true for a binary response. For a binary response, you should fit an appropriate model such as a logistic model. If your data is survey data, then use PROC SURVEYLOGISTIC. The proportion change in response probability for a unit increase in one of the predictors can be computed as (Px+1-Px)/Px = (Px+1/Px)-1, where Px is the response probability at some level of a predictor, X, and Px+1 is the probability after a unit increase in X. It can be expressed as a percent change by multiplying by 100. This change in probability over a range is not, strictly speaking, a marginal effect. Marginal effects and the change in probability are discussed in detail in this note. The note discusses how marginal effects can be estimated using the Margins macro (which cannot be used with survey data) and how the change (difference) in probabilities can be estimated using the NLMeans macro. The proportion change can also be computed at fixed values of the other predictors in the model using the NLEST macro. See the section of the above note titled "Estimating the difference in probability at specific points". Using the logistic model fit there, the following NLEST macro call estimates the proportion change in response probability from BLAST=0.5 to 1.5 with SMEAR fixed at 0.63.

%nlest(instore=log, f=(logistic(b_p1+1.5*b_p2+.63*b_p3)/logistic(b_p1+0.5*b_p2+.63*b_p3))-1, label=PropChng .5 to 1.5)

The resulting estimate is 1.5085, or as a percentage, 150.85%, indicating about a 150% increase in the response probability from BLAST=0.5 to 1.5 at SMEAR=0.63. A standard error, test of equality to zero, and confidence limits are also provided. While PROC LOGISTIC was used here, the same can be done with a model fit in PROC SURVEYLOGISTIC. An estimate that would be more like a marginal effect would be the average of the above proportions evaluated for each observation using each observation's particular value of SMEAR. The proportion change in each observation can be obtained using this macro call

%nlest(instore=log, f=(logistic(b_p1+1.5*b_p2+smear*b_p3)/logistic(b_p1+0.5*b_p2+smear*b_p3))-1, 
   score=remiss, outscore=out)

and then averaging the estimated proportions (in variable PRED)

proc means data=out mean; var pred; run;

which yields a similar value, 1.512 or 151.2%. A proper standard error and confidence interval for this average estimate is not available though.

kain · Posted 11-26-2020 01:12 PM

I see. Thank you so much!

Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Re: Marginal effect for binary variables with SURVEYREG?

Ready to join fellow brilliant minds for the SAS Hackathon?