Dear all,
i have to perform panel data regression with industry-fixed effects and year fixed effects
my sample is having cross-section of companies and time series of years. moreover, industry classification of each company is also mentioned in the data set.
the format of my data set is as follows.
company_name | year | industry_code | y | x1 | x2 | x3 |
20 Microns Ltd | 2011 | 1 | 956.5 | 69.8 | 610.1 | 542.7 |
20 Microns Ltd. | 2012 | 1 | 82.3 | 5.1 | 142.8 | 281.8 |
20 Microns Ltd. | 2013 | 1 | -25.3 | -38.2 | -98.7 | 123 |
20 Microns Ltd. | 2014 | 1 | 413.7 | 26.1 | 130.4 | 65.8 |
7Seas Entertainment Ltd. | 2011 | 2 | -121.2 | 100.9 | -96.5 | 88 |
7Seas Entertainment Ltd. | 2012 | 2 | -20.9 | -80.7 | 18.6 | 246 |
7Seas Entertainment Ltd. | 2013 | 2 | -20.9 | -80.7 | 18.6 | 246 |
7Seas Entertainment Ltd. | 2014 | 2 | -2.8 | -21.8 | -104.8 | 41.4 |
7Seas Entertainment Ltd. | 2015 | 2 | 7871.9 | 1086.5 | 5092.9 | 5177.8 |
A B B India Ltd. | 2011 | 3 | -5687.4 | -686 | -4771.1 | 1111.3 |
A B B India Ltd. | 2012 | 3 | 678.6 | -263.9 | -134.7 | 657.5 |
A B B India Ltd. | 2013 | 3 | 3450.1 | 26.3 | 1066.3 | 1654.9 |
A B B India Ltd. | 2014 | 3 | 1010.7 | -41.6 | 1379.2 | 41.8 |
A B B India Ltd. | 2015 | 3 | 4309.7 | 88.4 | -2329.2 | 6173.1 |
3I Infotech Ltd. | 2011 | 1 | 4309.7 | 88.4 | -2329.2 | 6173.1 |
3I Infotech Ltd. | 2012 | 1 | 39.2 | -636 | -20 | 235.2 |
3I Infotech Ltd. | 2013 | 1 | 1056.3 | 32.1 | 611.9 | 555.8 |
3I Infotech Ltd. | 2014 | 1 | 887.9 | 56.9 | 992.2 | 3359.1 |
3I Infotech Ltd. | 2015 | 1 | -174.9 | 1.2 | -519.6 | 232.9 |
3I Infotech Ltd. | 2016 | 1 | -828.7 | 542.1 | -1084.5 | 2738.2 |
industry fixed effects is to be estimated based on industry code and time series fixed effects has to be estimated based on year
please suggest me SAS code
thanks in advance
This is a system error, can you check your SAS and whether it has the ETS license? By the way I just noticed your industry id has duplicated time sequence. Proc Panel suppose you id time look like:
id | time |
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 2 |
2 | 3 |
2 | 4 |
… | … |
If you have duplicated time periods in one industry I suggest you use dummy directly. Also in proc panel the id statement requires cross section id comes first and then time id:
data testdat;
input company_name $ year industry_code y x1 x2 x3;
datalines;
20_Microns_Ltd 2011 1 956.5 69.8 610.1 542.7
20_Microns_Ltd 2012 1 82.3 5.1 142.8 281.8
20_Microns_Ltd 2013 1 -25.3 -38.2 -98.7 123
20_Microns_Ltd 2014 1 413.7 26.1 130.4 65.8
7Seas_Entertai 2011 2 -121.2 100.9 -96.5 88
7Seas_Entertai 2012 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2013 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2014 2 -2.8 -21.8 -104.8 41.4
7Seas_Entertai 2015 2 7871.9 1086.5 5092.9 5177.8
A_B_B_India_Ltd. 2011 3 -5687.4 -686 -4771.1 1111.3
A_B_B_India_Ltd. 2012 3 678.6 -263.9 -134.7 657.5
A_B_B_India_Ltd. 2013 3 3450.1 26.3 1066.3 1654.9
A_B_B_India_Ltd. 2014 3 1010.7 -41.6 1379.2 41.8
A_B_B_India_Ltd. 2015 3 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2011 1 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2012 1 39.2 -636 -20 235.2
3I_Infotech_Ltd. 2013 1 1056.3 32.1 611.9 555.8
3I_Infotech_Ltd. 2014 1 887.9 56.9 992.2 3359.1
3I_Infotech_Ltd. 2015 1 -174.9 1.2 -519.6 232.9
3I_Infotech_Ltd. 2016 1 -828.7 542.1 -1084.5 2738.2
run;
proc sort data=testdat;
by company_name year;
run;
proc panel data=testdat;
id company_name year;
class industry_code year;
model y = x1 x2 year industry_code/ pooled;
run;
If you can specify an assumed response distribution, then you can fit an appropriate model that takes account of the correlation among the observations within each company by using PROC GEE. You could fit a model that allows you to see if there are differences among the industries and differences (or trend) over time. For example, something like the following can test for industry differences and a linear trend over time, assuming that the response is normally distributed and an autoregressive correlation structure within companies. Note that validity of the GEE model requires a large number of clusters (level of SUBJECT=), so more would be needed than you show in your example data.
proc gee;
class company_name industry_code;
model y=industry_code year x1-x3 / type3;
repeated subject=company_name / type=ar;
run;
I'm not sure what you want, for cross section, you just want industry fix effect only or you want both industry and company effect???
For the first one you can do:
proc panel data=your_data_name;
id industry year;
model y = x1 x2 x3 / fixtwo;
run;
For the second one there's no built in model, you can add all dummies to the pooled model:
proc panel data=your_data_name;
id company_name year;
class company_name year industry;
model y = x1 x2 x3 company_name year industry/ pooled;
run;
The error indicate you may need to sort the data first.
proc sort data=your_data;
by industry year;
run;
PROC PANEL assumes you've already sorted data by cross section first then time.
dear xyz0505
greeting of the day
when i run the panel regression with the code suggested by you.
But, i got the following error.
proc panel data=Model_file;
35 model Y = X1- X13 / fixtwo;
36 id year Industry;
37 run;
ERROR: Unable to create an instance of the TKEPAN TK Extension.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PANEL used (Total process time):
real time 0.09 seconds
cpu time 0.00 seconds
I am not able to understand it .
please help me in this regard
thanks in advance
This is a system error, can you check your SAS and whether it has the ETS license? By the way I just noticed your industry id has duplicated time sequence. Proc Panel suppose you id time look like:
id | time |
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 2 |
2 | 3 |
2 | 4 |
… | … |
If you have duplicated time periods in one industry I suggest you use dummy directly. Also in proc panel the id statement requires cross section id comes first and then time id:
data testdat;
input company_name $ year industry_code y x1 x2 x3;
datalines;
20_Microns_Ltd 2011 1 956.5 69.8 610.1 542.7
20_Microns_Ltd 2012 1 82.3 5.1 142.8 281.8
20_Microns_Ltd 2013 1 -25.3 -38.2 -98.7 123
20_Microns_Ltd 2014 1 413.7 26.1 130.4 65.8
7Seas_Entertai 2011 2 -121.2 100.9 -96.5 88
7Seas_Entertai 2012 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2013 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2014 2 -2.8 -21.8 -104.8 41.4
7Seas_Entertai 2015 2 7871.9 1086.5 5092.9 5177.8
A_B_B_India_Ltd. 2011 3 -5687.4 -686 -4771.1 1111.3
A_B_B_India_Ltd. 2012 3 678.6 -263.9 -134.7 657.5
A_B_B_India_Ltd. 2013 3 3450.1 26.3 1066.3 1654.9
A_B_B_India_Ltd. 2014 3 1010.7 -41.6 1379.2 41.8
A_B_B_India_Ltd. 2015 3 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2011 1 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2012 1 39.2 -636 -20 235.2
3I_Infotech_Ltd. 2013 1 1056.3 32.1 611.9 555.8
3I_Infotech_Ltd. 2014 1 887.9 56.9 992.2 3359.1
3I_Infotech_Ltd. 2015 1 -174.9 1.2 -519.6 232.9
3I_Infotech_Ltd. 2016 1 -828.7 542.1 -1084.5 2738.2
run;
proc sort data=testdat;
by company_name year;
run;
proc panel data=testdat;
id company_name year;
class industry_code year;
model y = x1 x2 year industry_code/ pooled;
run;
PROC PANEL drop the redundant dummies for you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.