BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

I have below mentioned two models. In Model-1 dependent (Buy) and independent (Gender) variables are categorical variables. Rest of the control variables are continuous. Whereas, in Model-2 all the dependent and independent variables (i.e. interaction terms of gender and types of executives) are dummy variables. Also, I need to include two-way fixed effect in my regression models i.e. time and firm fixed effect. My data has 12 months and 4000 unique firms.

 

I am not sure which "proc" to use for these two models to get accurate estimates. Just to mention, for Model-2, I need to measure a significant difference between the coefficients of male and female in 3 different executive positions. For example, difference between coefficients of Female_Chairman and Male_Chairman. 

 

Model - 1:

Buy= α+ β1Gender+ β2Age+ β3Experience+ β4Return+ β5Firm_Size+ β6Volatility+ β7shares+ ∑ βmonth+ ∑ βfirm+ ε

 

Model - 2:

Buy= α+ β1Female_Chairman + β2Male_Chairman + β3Female_Director + β4Male_Director + β5Female_Officer + β6Male_Officer+ ∑ βmonth+ ∑ βfirm+ ε

 

Buy (dependent) = dummy variable equals “1” when shares are bought, and “0” if shares are sold.

Gender (independent) = dummy variable equals “1” if executive is a female, and “0” if male.

 

Previously I used proc glm, but then realized that due to a binary dependent variable the coefficients would be misleading.

proc glm data= Model_1;
absorb firm;
 class month;
	model Buy = Gender Age Experience Return Firm_Size Volatility shares month / solution noint;
run;

Kindly suggest an appropriate syntax for the solution. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

As I discussed in one of your other postings on this same question (would be better to not create multiple postings), GLM is NOT appropriate for a binary response since it assumes the response is normal and a binary response is about as far from normal as you can get. I suggested using either PROC GEE with a REPEATED statement or PROC LOGISTIC with a STRATA statement. See the examples of both of these in the documentation for those procedures. These are two approaches for modeling data with repeated measurements where you have clusters of correlated observations. If you consider all of the observations within a firm as correlated, then you would specify your firm variable in the SUBJECT= option in GEE's REPEATED statement or in LOGISTIC's STRATA statement. Again, as I mentioned before, the model you say you want would include Gender and Position type variables (NOT dummy variables for their individual levels) and their interaction. These two variables should appear in both the CLASS and MODEL statements in either procedure and then the SLICE statement can be used to do the comparisons. I gave an example of this using PROC LOGISTIC.

View solution in original post

6 REPLIES 6
StatDave
SAS Super FREQ

As I discussed in one of your other postings on this same question (would be better to not create multiple postings), GLM is NOT appropriate for a binary response since it assumes the response is normal and a binary response is about as far from normal as you can get. I suggested using either PROC GEE with a REPEATED statement or PROC LOGISTIC with a STRATA statement. See the examples of both of these in the documentation for those procedures. These are two approaches for modeling data with repeated measurements where you have clusters of correlated observations. If you consider all of the observations within a firm as correlated, then you would specify your firm variable in the SUBJECT= option in GEE's REPEATED statement or in LOGISTIC's STRATA statement. Again, as I mentioned before, the model you say you want would include Gender and Position type variables (NOT dummy variables for their individual levels) and their interaction. These two variables should appear in both the CLASS and MODEL statements in either procedure and then the SLICE statement can be used to do the comparisons. I gave an example of this using PROC LOGISTIC.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Elsewhere I recall that you (@Saba1) say there are 4000+ firms. I'm guessing that will be too many for most procedures (might work in HPMIXED; @StatDave might know). And regardless, I wonder how you would interpret the effect of a fixed effects factor with over 4000 levels.

 

Would it be better to think of firm (PERMNO, elsewhere) as a random effects factor, where the 4000 firms in your dataset represent a random sample of all possible firms that you would want to make inference to? This would be a shift in how you envision your study and the questions it is meant to address.

 

I agree with @StatDave that you should use a procedure that is appropriate for a binary response, and that you should incorporate your categorical predictor variables in a form that can be use in the CLASS statement.

 

Other aspects of your study suggest that observations are not independent and consequently a mixed model may be necessary. A mixed model with a binary response is complicated; SAS® for Mixed Models: Introduction and Basic Applications is an excellent resource.

 

I hope this helps. 

 

Saba1
Quartz | Level 8

@sld Thanks for your thoughtful reply. I am using "proc logistic" and  need to consider time and firm fixed effect i.e. creating dummy variables for each one of them and then taking last category as a reference. You are right, when I use PERMNO in class statement, the code keeps running for around two hours and then creates output tables. Else all is fine but Timings is the main concern here.

Saba1
Quartz | Level 8

@StatDave Thank you so much. I am using proc logistic for the mentioned models. I am a bit confused regarding the difference between strata and class statements. My purpose is to include time and firm fixed effects in the models i.e. creating dummies for each one of them and considering last category as reference. Class statement does the trick, i.e. putting both "Month" and "PERMNO" in class statement. I expect the same results when using PERMNO in Strata statement and Month in Class Statement, however, the results are different. Using PERMNO in Class statement is taking around 2 - 2.30 hours in creating output tables.

Need your suggestion in this regard. Thanks

StatDave
SAS Super FREQ

The CLASS statement simply creates coded design (or "dummy") variables for the specified variables to represent them in the model. It does not fundamentally effect the method of analysis. The STRATA statement does not create design variables. Instead, it specifies the variable(s) which define groups of correlated observations and then instructs PROC LOGISTIC to estimate the model using conditional maximum likelihood estimation. This is the so-called fixed effects logistic model and has the effect of estimating the other parameters in the model while conditioning out the parameters that indicate the groups of correlated observations. With or without the STRATA statement, you can use the STRATA statement to indicate the categorical predictors in your model. But the analyses with vs without the STRATA statement are entirely different as is obvious since the group parameters are not estimated in the conditional model. The fixed effects model is discussed and illustrated in detail in the book "Fixed Effects Regression Methods for Longitudinal Data Using SAS"  (Allison, P., SAS Institute, 2005). 

Saba1
Quartz | Level 8
@StatDave: Thank you so much for this detailed explanation. I am very clear now about the workings of class and strata statements. Thanks again.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1184 views
  • 4 likes
  • 3 in conversation