BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8
Hi

 

I want to run a regression where Y is a dichotomous variable (1,0). All independent variables are continuous variables. I want standard errors to be clustered at firm level, with a fixed effect at industry level. I need to obtain propensity scores for observations, therefore my preference is to use logit or probit model.

 

I am currently using the code below, but it does not account for either clustered SE or fixed effect.

ods exclude all;

proc logistic data=have descending;
    model Y = X1 X2  /link=probit rsquare;
  output out=prob pred=ps;

ods output ParameterEstimates=coef;
run;

ods exclude none;

 

Kindly suggest some solution.

 

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Code is fine which you can see by running it. Without the TYPE= option in the REPEATED statement, you are saying that the correlation structure among the repeated measures in a cluster is one of independent, but a robust variance estimator is still used. You could use other structures such as exchangeable (TYPE=EXCH), autoregressive (TYPE=AR), or even unstructured (TYPE=UN) though this last can require estimating a lot of correlations and can cause fitting problems depending on the size of your clusters. But with any of these, the final parameter estimates and standard errors might still not substantially differ from PROC LOGISTIC depending on the correlation that exists in your data. 

 

Define what you mean by "fixed effect" - the GEE model you are fitting in GENMOD provides the prime advantage of a fixed effects model since it doesn't require estimating a parameter for each individual cluster and adjusts the variance of the estimates for the within-cluster correlation. But then so does the random effects model I mentioned earlier and also the conditional logistic model (accomplished with the STRATA statement in GENMOD or LOGISTIC). The conditional model is most often referred to as a "fixed effects" model, but predicted values are more problematic. See the book "Fixed Effects Regression Methods for Longitudinal Data Using SAS"  (Allison, P., SAS Institute, 2005).

View solution in original post

3 REPLIES 3
StatDave
SAS Super FREQ

You could account for clustering by either fitting a random effects logistic model in PROC GLIMMIX, or a Generalized Estimating Equations (GEE) logistic model in PROC GEE (or GENMOD). 

Saba1
Quartz | Level 8

@StatDave  Thanks for your reply. I am using the following code to get cluster SE, but all the estimates, standard errors, and probabilities are similar to what the above-mentioned "proc logistic" model is giving. Please correct me if I am wrong in this code. Moreover, kindly advice on how to modify this code to consider fixed effect along-with cluster SE. Thanks.

 

proc genmod data= have descending;
class firm_ID / param=ref;
   model Y = X1 X2 /dist=bin link=probit r;
   	repeated subject=firm_ID; 
output out=prob pred=ps;
run;

,  

StatDave
SAS Super FREQ

Code is fine which you can see by running it. Without the TYPE= option in the REPEATED statement, you are saying that the correlation structure among the repeated measures in a cluster is one of independent, but a robust variance estimator is still used. You could use other structures such as exchangeable (TYPE=EXCH), autoregressive (TYPE=AR), or even unstructured (TYPE=UN) though this last can require estimating a lot of correlations and can cause fitting problems depending on the size of your clusters. But with any of these, the final parameter estimates and standard errors might still not substantially differ from PROC LOGISTIC depending on the correlation that exists in your data. 

 

Define what you mean by "fixed effect" - the GEE model you are fitting in GENMOD provides the prime advantage of a fixed effects model since it doesn't require estimating a parameter for each individual cluster and adjusts the variance of the estimates for the within-cluster correlation. But then so does the random effects model I mentioned earlier and also the conditional logistic model (accomplished with the STRATA statement in GENMOD or LOGISTIC). The conditional model is most often referred to as a "fixed effects" model, but predicted values are more problematic. See the book "Fixed Effects Regression Methods for Longitudinal Data Using SAS"  (Allison, P., SAS Institute, 2005).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 987 views
  • 3 likes
  • 2 in conversation