Statistical Procedures

margalits1 · Posted 08-28-2021 03:16 AM

Hello all,

I have data of the following structure:

It is a panel data that is aggregated from multiple panels.

I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event) influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).

The data is of the following structure:

year_variable gvkey year_data variable sales

1993 1 1990 0.5 1

1993 1 1991 0.5 2

1993 1 1992 0.5 3

1993 1 1993 0.5 4

1993 1 1994 0.5 5

1993 1 1995 0.5 6

1993 1 1996 0.5 7

1993 2 1990 1 11

1993 2 1991 1 12

1993 2 1992 1 13

1993 2 1993 1 14

1993 2 1994 1 15

1993 2 1995 1 16

1993 2 1996 1 17

....

1994 1 1991 10 2

1994 1 1992 10 3

1994 1 1993 10 4

1994 1 1994 10 5

1994 1 1995 10 6

1994 1 1996 10 7

1994 1 1997 10 8

Meaning, the data line for a certain year appears multiple times in this sample (7 times). For example, see that gvkey 1 year 1994 apperas multiple times.

I want to run a regression with year and gvkey (firm) fixed effects but not sure how to define them in this panel structure.

When I did the following command it only analyzed part of the data (5000 observations out of 115,000)

proc sort data=&data_name;
BY year_variable gvkey year_data;
run;

and then:

proc glm data=&data_name;
CLASS gvkey year_data;
MODEL &var_name=**variables*.... gvkey year_data /SOLUTION;
ods output ParameterEstimates=&var_name._&data_name;
RUN;
ods trace off;

What should I do?

sbxkoenk · Posted 08-28-2021 07:41 AM

Hello,

> When I did the following command it only analyzed part of the data (5000 observations out of 115,000)

How do you know that?

Can you provide us with the LOG?
Please use the "Insert Code" ( </> ) icon in the toolbar and paste the LOG in the pop-up window that appears. This way the LOG does not lose formatting and structure.

Also if you have panel data (time series cross sectional data) with YEAR and FIRM I think you should use PROC TSCSREG (SAS/ETS) or PROC PANEL (SAS/ETS) or PROC CPANEL (SAS VIYA Econometrics) or PROC GLIMMIX (SAS/STAT).
Have you considered using one of these 4 procedures?

Kind regards,

Koen

margalits1 · Posted 08-28-2021 09:00 AM

Class        Levels  Values

year_data        28  1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
                     2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
                     2014 2015 2016 2017


                    Number of Observations Read      115205
                    Number of Observations Used        5629

@sbxkoenk

I understand it from the number of observations compared to read,

Regarding the commands you mentioned- I will check. But, not sure how it mitigates my problem?

What is their advantage compared to proc GLM? I want a regression with time and firm fixed effects, and later on to add two-way cluster.

sbxkoenk · Posted 08-28-2021 09:11 AM

Hello,

On the n° of observations used << n° of observations read:

For an analysis involving one dependent variable, PROC GLM uses an observation if values are non-missing for that dependent variable and ALL the independent variables.
Hence: Check for missing values in your observations.

On the suitability of PROC GLM:

When regression is performed on time series data, the errors might not be independent. Often errors are autocorrelated; that is, each error is correlated with the error immediately before it.
You violate the assumption of independent observations in PROC GLM. The estimation methods in SAS/ETS and SAS Econometrics (VIYA) are made to "correct" for this.
You absolutely need to use PROC PANEL (or PROC TSCSREG) or PROC CPANEL, also for the cluster-robust standard errors in your panel data analysis ?
If you do not have SAS/ETS, nor SAS Econometrics, you can do PROC GLIMMIX (SAS/STAT) for panel data analysis.

Cheers,
Koen

margalits1 · Posted 08-28-2021 09:37 AM

Thanks for the quick answer!
Regarding its first part, there are indeed missing values for some of the independent variables.
I don't wish to remove these observations.
Does one of the other procedures take these observations into account?

sbxkoenk · Posted 08-28-2021 11:16 AM

Hello,

(Almost) every (statistical) SAS procedure has documentation on how it handles missing values:

The PANEL Procedure > Details > Missing Values
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_panel_details03.htm?homeOnFail

The CPANEL Procedure > Details > Missing Values

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casecon/casecon_cpanel_details03.htm?homeO...

The COUNTREG procedure > Details > Missing Values

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_countreg_details04.htm

I mention the COUNTREG procedure as it supports:

fixed-effects and random-effects Poisson models for panel data
fixed-effects and random-effects negative binomial models for panel data

I don't know what is the nature of your dependent variable / outcome variable(?). What is the measurement scale of your dependent variable? Is it binary, nominal (>2 categories), ordinal, interval, ratio, proportion [0,1], count?

Kind regards,

Koen

margalits1 · Posted 08-29-2021 05:05 AM

Thank you. The outcome is nominal variable (continuous) .

margalits1 · Posted 08-28-2021 09:49 AM

And another important question,

this data structure (where each year and gvkey appears more than 1 time: meaning an "appended" panel data, but these are not duplicates,

but rather independent observations in my perspective),

does it require additional settings in these commands?

To illustrate refer to these lines:

year_variable gvkey year_data variable sales

1993 1 1991 0.5 2

1994 1 1991 10 2

Is the way I sorted the dataset beforehand is ok?

proc sort data=&data_name;
BY year_variable gvkey year_data;
run;

or, should I rather sort it by:

proc sort data=&data_name; by gvkey year_data;

run;

sbxkoenk · Posted 08-28-2021 11:25 AM

Hello,

I have read your Original Post again:

> I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event) influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).

How is performance (your dependent variable) expressed? Is it a continuous measure (interval-scaled)?

I see now better what you want to do.
I have to think about this on a free moment (later this weekend).
Will come back to you.

Still I think you will have to turn to Econometrics instead of the SAS/STAT procedures.

Cheers,

Koen

margalits1 · Posted 08-29-2021 05:06 AM

Thank you for the detailed answer!

It is continuous.

Statistical Procedures

combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Re: combined panel data

Follow Us

What is...

Statistical Procedures

Register Today!

Follow Us

What is...