- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I have data of the following structure:
It is a panel data that is aggregated from multiple panels.
I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event) influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).
The data is of the following structure:
year_variable gvkey year_data variable sales
1993 1 1990 0.5 1
1993 1 1991 0.5 2
1993 1 1992 0.5 3
1993 1 1993 0.5 4
1993 1 1994 0.5 5
1993 1 1995 0.5 6
1993 1 1996 0.5 7
1993 2 1990 1 11
1993 2 1991 1 12
1993 2 1992 1 13
1993 2 1993 1 14
1993 2 1994 1 15
1993 2 1995 1 16
1993 2 1996 1 17
....
1994 1 1991 10 2
1994 1 1992 10 3
1994 1 1993 10 4
1994 1 1994 10 5
1994 1 1995 10 6
1994 1 1996 10 7
1994 1 1997 10 8
Meaning, the data line for a certain year appears multiple times in this sample (7 times). For example, see that gvkey 1 year 1994 apperas multiple times.
I want to run a regression with year and gvkey (firm) fixed effects but not sure how to define them in this panel structure.
When I did the following command it only analyzed part of the data (5000 observations out of 115,000)
proc sort data=&data_name;
BY year_variable gvkey year_data;
run;
and then:
proc glm data=&data_name;
CLASS gvkey year_data;
MODEL &var_name=**variables*.... gvkey year_data /SOLUTION;
ods output ParameterEstimates=&var_name._&data_name;
RUN;
ods trace off;
What should I do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
> When I did the following command it only analyzed part of the data (5000 observations out of 115,000)
How do you know that?
Can you provide us with the LOG?
Please use the "Insert Code" ( </> ) icon in the toolbar and paste the LOG in the pop-up window that appears. This way the LOG does not lose formatting and structure.
Also if you have panel data (time series cross sectional data) with YEAR and FIRM I think you should use PROC TSCSREG (SAS/ETS) or PROC PANEL (SAS/ETS) or PROC CPANEL (SAS VIYA Econometrics) or PROC GLIMMIX (SAS/STAT).
Have you considered using one of these 4 procedures?
Kind regards,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Class Levels Values year_data 28 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Number of Observations Read 115205 Number of Observations Used 5629
I understand it from the number of observations compared to read,
Regarding the commands you mentioned- I will check. But, not sure how it mitigates my problem?
What is their advantage compared to proc GLM? I want a regression with time and firm fixed effects, and later on to add two-way cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
On the n° of observations used << n° of observations read:
For an analysis involving one dependent variable, PROC GLM uses an observation if values are non-missing for that dependent variable and ALL the independent variables.
Hence: Check for missing values in your observations.
On the suitability of PROC GLM:
When regression is performed on time series data, the errors might not be independent. Often errors are autocorrelated; that is, each error is correlated with the error immediately before it.
You violate the assumption of independent observations in PROC GLM. The estimation methods in SAS/ETS and SAS Econometrics (VIYA) are made to "correct" for this.
You absolutely need to use PROC PANEL (or PROC TSCSREG) or PROC CPANEL, also for the cluster-robust standard errors in your panel data analysis ?
If you do not have SAS/ETS, nor SAS Econometrics, you can do PROC GLIMMIX (SAS/STAT) for panel data analysis.
Cheers,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Regarding its first part, there are indeed missing values for some of the independent variables.
I don't wish to remove these observations.
Does one of the other procedures take these observations into account?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
(Almost) every (statistical) SAS procedure has documentation on how it handles missing values:
The PANEL Procedure > Details > Missing Values
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_panel_details03.htm?homeOnFail
The CPANEL Procedure > Details > Missing Values
The COUNTREG procedure > Details > Missing Values
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_countreg_details04.htm
I mention the COUNTREG procedure as it supports:
-
fixed-effects and random-effects Poisson models for panel data
-
fixed-effects and random-effects negative binomial models for panel data
I don't know what is the nature of your dependent variable / outcome variable(?). What is the measurement scale of your dependent variable? Is it binary, nominal (>2 categories), ordinal, interval, ratio, proportion [0,1], count?
Kind regards,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. The outcome is nominal variable (continuous) .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And another important question,
this data structure (where each year and gvkey appears more than 1 time: meaning an "appended" panel data, but these are not duplicates,
but rather independent observations in my perspective),
does it require additional settings in these commands?
To illustrate refer to these lines:
year_variable gvkey year_data variable sales
1993 1 1991 0.5 2
1994 1 1991 10 2
Is the way I sorted the dataset beforehand is ok?
proc sort data=&data_name;
BY year_variable gvkey year_data;
run;
or, should I rather sort it by:
proc sort data=&data_name; by gvkey year_data;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have read your Original Post again:
> I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event) influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).
How is performance (your dependent variable) expressed? Is it a continuous measure (interval-scaled)?
I see now better what you want to do.
I have to think about this on a free moment (later this weekend).
Will come back to you.
Still I think you will have to turn to Econometrics instead of the SAS/STAT procedures.
Cheers,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the detailed answer!
It is continuous.