BookmarkSubscribeRSS Feed
Dianac
Calcite | Level 5

 

Hello, 

 

I have the following code for my logistic regression.  However, I need to add industry fixed effects, year fixed effects with  dummy variables.  Could someone help me with this? I am trying to build a longitudinal data for 4 years.  I used strata statement but all the dummy variables are dropped because of redundancy.  Is that correct? Some of two dummy variables do not change over time whereas one does. 

 

Thank you very much in advance.

 

Proc logistic data=Exam.Alltables
plots(only)=(effect oddsratio);

Strata Year Hight_sensitive EPA Emissions_trading;
class High_sensitive (param= ref ref=' No ') EPA (param=ref ref='No ') Emissions_trading (param=ref ref='No ') ;
model disclosure (event= '1 ')= High_sensitive Div_emissions EPA Emissions_trading Assets_LN /
selection=backward;
run;

8 REPLIES 8
RyanKCarr
SAS Employee

Hi,

 

without some sample data, it is difficult to tell for sure if the variables should be dropping out, however I did notice one thing in the code that I wanted to ask about.

 

In the "strata" statement you have the following:

 

   Strata Year Hight_sensitive EPA Emissions_trading;

 

but in the class and model statements you have:

 

   class High_sensitive ...

 

is it possible that a simple typo of "Hight_sensitive" rather than "High_sensitive" may be the source of your issues?

 

best of luck!

 

Ryan

 

 

Dianac
Calcite | Level 5

Hello Ryan,

 

Thanks for your answer.  Yes, I mistyped here the variable high_sensitive but not in the SAS program.

 

In fact, I tried with this code for year fixed effect after reviewing the book Fixed Effects Regression Methods for Longitudinal data with SAS.

 

Proc logistic data=Exam.Alltables Desc;
class Year /PARAM=REF ;
model disclosure = Year Div_emissions Emissions_trading_num Assets_LN High_sensitive_num EPA_num;
STRATA Year;
run;

 

However, my concern is about Firm fixed effect.  There are around 600 companies for each year (4 years). The results are truly different.  All the variables are dropped out. 

 

Proc logistic data=Exam.Alltables Desc;
class Company_ID /PARAM=REF ;
model disclosure = Company_ID Div_emissions Emissions_trading_num Assets_LN High_sensitive_num EPA_num;
STRATA Company_ID;
run;

 

Please find enclosed sample data 

 

 

 

 

 

jjsingh04
Obsidian | Level 7

I'm not 100% sure of my answer, so take my answer with a grain of salt, but I think you should have 1 fewer dummy variables (i.e. k-1) than number of levels (k) in the variable you want fixed effects for. 

 

I.e. 4 years -> 3 year dummy variables (i.e. the 1 year not assigned a dummy variable is =0,0,0 for the 3 dummies),

600 firms -> 599 firm dummy variables (the 1 firm not assigned a dummy variable has values of 0 for all 599 of the firm dummy variables),

etc. 

 

You would put the names of the dummy variables on both your CLASS line of code and your MODEL line of code. 

 

i.e. if your year dummies are y1 to y3, and firm dummies are f1 to f599: 

 

CLASS y1 y2 y3 f1...f599 (and any other categorical explanatory variables) / PARAM=REF ;

MODEL disclosure = Div_emissions etc. etc. y1 y2 y3 f1...f599 ;

 

I wouldn't include Company_ID above because that's not an explanatory variable in the regression model. However, the firm fixed effects DO explain, so they ARE included. 

 

I found some code on how to automate the dummy variable creation process in SAS: 

https://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html

 

J.J.

 

 

Our lives are enriched by the people around us.
SteveDenham
Jade | Level 19

I am interested in your statement that "All the variables are dropped out." Are you doing some sort of variable selection method?  If so, I am not surprised that all of the variables drop out, as there are more variables than observations per Company_ID - only four years, but you are trying to fit coefficients for five variables as the STRATA statement has an effect.  This statement in the documentation of the STRATA statement applies:

 

STRATA variables can also be specified in the MODEL statement as classification or continuous covariates; however, the effects are nondegenerate only when crossed with a nonstratification variable.

 

In this case the effects are degenerate.

 

SteveDenham

StatDave
SAS Super FREQ

The conditional analysis conditions out the variables in the STRATA statement. This allows you to estimate the effects of the other variables without also having to estimate the (usually) large number of parameters implied by the STRATA variables. So, the variables in the STRATA statement should not appear in the MODEL statement unless, as Steve mentioned, they interact with non-STRATA variables.

jjsingh04
Obsidian | Level 7

If I understand correctly, we use Stratification AND / OR Matching (like Propensity Score Matching, etc.) to deal with CONFOUNDING, right?

 

If we're using BOTH Strata AND Matching, it's called a Conditional Logistic regression? Is that correct?

 

For example, if testing a drug (given/not given--treatment variable) to treat a patient (health improves/doesn't improve--the dependent variable), you could match by smoker (yes/no--1 matching criterion), eats healthy (yes/no--another matching criterion), and exerciser (yes/no--another matching criterion). Is that right? Are these matching criteria also called strata? 

 

And what does being degenerate and non-degenerate mean in this context?

 

Thanks,

J.J.

Our lives are enriched by the people around us.
SteveDenham
Jade | Level 19

I'll take a try at the last part about degenerate/non-degenerate. Suppose you list a group of variables as STRATA variables, and include the exact same variables in the MODEL statement (without interactions etc.)  While fitting the model, you would now have a design matrix where the levels of the variables in the STRATA statement are such that the matrix is singular.  To find out about this, add the CHECKDEPENDENCY option to the STRATA statement, probably with the =covariates or =all keyword.  Then covariates that are dependent on the strata variables are eliminated from the analysis.  This identifies the degenerate variables.

 

Does that make any sense at all?

 

SteveDenham

StatDave
SAS Super FREQ

Confounding variables are variables that can affect both the response and the predictor(s) of main interest, such as a treatment variable. In observational, nonrandomized studies, the effect of counfounders can cause the treatment to appear to have more or less effect on the response. One way of dealing with that is through matching of subjects on possible confounding variables. The blocks of matched subjects are considered strata and the variables that define the strata are specified in the STRATA statement in PROC LOGISTIC. You can then fit a model that includes the main variable(s) of interest. The analysis is done by maximizing a conditional likelihood and is called conditional logistic regression. Another way of dealing with confounders is with causal analysis as can be done with PROC CAUSALTRT. See the discussion and examples in the documentation of that procedure.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 4504 views
  • 1 like
  • 5 in conversation