Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
margalits1
Obsidian | Level 7

Hello all,

I have data of the following structure:

It is a panel data that is aggregated from multiple panels.

I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event)  influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).

The data is of the following structure:

year_variable     gvkey     year_data    variable   sales

1993                1            1990              0.5             1

1993                1            1991              0.5             2

1993                1            1992              0.5             3

1993                1            1993              0.5             4

1993                1            1994              0.5             5

1993                1            1995              0.5             6

1993                1            1996              0.5             7

1993                2            1990                1              11

1993                2            1991                1              12

1993                2            1992                1              13

1993                2            1993                1              14

1993                2            1994                1              15

1993                2            1995                1              16

1993                2            1996                1              17

....

1994                1            1991              10               2

1994                1            1992              10               3

1994                1            1993              10              4

1994                1            1994              10              5

1994                1            1995              10              6

1994                1            1996              10              7

1994                1            1997              10              8

 

Meaning, the data line for a certain year appears multiple times in this sample (7 times). For example, see that gvkey 1 year 1994 apperas multiple times. 

I want to run a regression with year and gvkey (firm) fixed effects but not sure how to define them in this panel structure.

When I did the following command it only analyzed part of the data (5000 observations out of 115,000)

proc sort data=&data_name;
BY year_variable gvkey year_data;
run;

 

and then:

proc glm data=&data_name;
CLASS gvkey year_data;
MODEL &var_name=**variables*.... gvkey year_data /SOLUTION;
ods output ParameterEstimates=&var_name._&data_name;
RUN;
ods trace off;

 

What should I do?

 

 

9 REPLIES 9
sbxkoenk
SAS Super FREQ

Hello,

 

When I did the following command it only analyzed part of the data (5000 observations out of 115,000)

How do you know that?

Can you provide us with the LOG?
Please use the "Insert Code" ( </> ) icon in the toolbar and paste the LOG in the pop-up window that appears. This way the LOG does not lose formatting and structure.

 

Also if you have panel data (time series cross sectional data) with YEAR and FIRM I think you should use PROC TSCSREG (SAS/ETS) or PROC PANEL (SAS/ETS) or PROC CPANEL (SAS VIYA Econometrics) or PROC GLIMMIX (SAS/STAT).
Have you considered using one of these 4 procedures?

 

Kind regards,

Koen

margalits1
Obsidian | Level 7
Class        Levels  Values

year_data        28  1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
                     2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
                     2014 2015 2016 2017


                    Number of Observations Read      115205
                    Number of Observations Used        5629

@sbxkoenk 

I  understand it from the number of observations compared to read,

Regarding the commands you mentioned- I will check. But, not sure how it mitigates my problem?

What is their advantage compared to proc GLM? I want a regression with time and firm fixed effects, and later on to add two-way cluster.

sbxkoenk
SAS Super FREQ

Hello,

 

On the n° of observations used << n° of observations read:

For an analysis involving one dependent variable, PROC GLM uses an observation if values are non-missing for that dependent variable and ALL the independent variables.
Hence: Check for missing values in your observations.

 

On the suitability of PROC GLM:

When regression is performed on time series data, the errors might not be independent. Often errors are autocorrelated; that is, each error is correlated with the error immediately before it.
You violate the assumption of independent observations in PROC GLM. The estimation methods in SAS/ETS and SAS Econometrics (VIYA) are made to "correct" for this.
You absolutely need to use PROC PANEL (or PROC TSCSREG) or PROC CPANEL, also for the cluster-robust standard errors in your panel  data analysis ?
If you do not have SAS/ETS, nor SAS Econometrics, you can do PROC GLIMMIX (SAS/STAT) for panel data analysis.

 

Cheers,
Koen

margalits1
Obsidian | Level 7
Thanks for the quick answer!
Regarding its first part, there are indeed missing values for some of the independent variables.
I don't wish to remove these observations.
Does one of the other procedures take these observations into account?
sbxkoenk
SAS Super FREQ

Hello,

 

(Almost) every (statistical) SAS procedure has documentation on how it handles missing values:

The PANEL Procedure > Details > Missing Values
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_panel_details03.htm?homeOnFail

The CPANEL Procedure > Details > Missing Values

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casecon/casecon_cpanel_details03.htm?homeO...

 

The COUNTREG procedure > Details > Missing Values

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_countreg_details04.htm

I mention the COUNTREG procedure as it supports:

  • fixed-effects and random-effects Poisson models for panel data

  • fixed-effects and random-effects negative binomial models for panel data

I don't know what is the nature of your dependent variable / outcome variable(?). What is the measurement scale of your dependent variable? Is it binary, nominal (>2 categories), ordinal, interval, ratio, proportion [0,1], count?

 

Kind regards,

Koen

margalits1
Obsidian | Level 7

Thank you. The outcome is nominal variable (continuous) .

margalits1
Obsidian | Level 7

And another important question,

this data structure (where each year and gvkey appears more than 1 time: meaning an "appended" panel data, but these are not duplicates,

but rather independent observations in my perspective),

does it require additional settings in these commands?

To illustrate refer to these lines: 

year_variable     gvkey     year_data    variable   sales

1993                         1            1991              0.5             2

1994                         1            1991              10               2

 

Is the way I sorted the dataset beforehand is ok?

proc sort data=&data_name;
BY year_variable gvkey year_data;
run;

 

or, should I  rather sort it by:

proc sort data=&data_name; by gvkey year_data;

run;

 

 

sbxkoenk
SAS Super FREQ

Hello,

 

I have read your Original Post again:

> I test how a certain variable (calculated for each year separately and is defined for this year and remains constant for the 3 years prior and after the event)  influences the performance 3 years before and after using Diff-In-Diff method (post and treatment dummies also appear here, not relevant for the discussion...).

How is performance (your dependent variable) expressed? Is it a continuous measure (interval-scaled)?

I see now better what you want to do.
I have to think about this on a free moment (later this weekend). 
Will come back to you.

Still I think you will have to turn to Econometrics instead of the SAS/STAT procedures.

 

Cheers,

Koen

margalits1
Obsidian | Level 7

Thank you for the detailed answer!

It is continuous.

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1195 views
  • 0 likes
  • 2 in conversation