Solved: Re: panel data regression with fixed effects

srikanthyadav44 · Posted 03-03-2020 02:36 AM

Dear all,

i have to perform panel data regression with industry-fixed effects and year fixed effects

my sample is having cross-section of companies and time series of years. moreover, industry classification of each company is also mentioned in the data set.

the format of my data set is as follows.

company_name	year	industry_code	y	x1	x2	x3
20 Microns Ltd	2011	1	956.5	69.8	610.1	542.7
20 Microns Ltd.	2012	1	82.3	5.1	142.8	281.8
20 Microns Ltd.	2013	1	-25.3	-38.2	-98.7	123
20 Microns Ltd.	2014	1	413.7	26.1	130.4	65.8
7Seas Entertainment Ltd.	2011	2	-121.2	100.9	-96.5	88
7Seas Entertainment Ltd.	2012	2	-20.9	-80.7	18.6	246
7Seas Entertainment Ltd.	2013	2	-20.9	-80.7	18.6	246
7Seas Entertainment Ltd.	2014	2	-2.8	-21.8	-104.8	41.4
7Seas Entertainment Ltd.	2015	2	7871.9	1086.5	5092.9	5177.8
A B B India Ltd.	2011	3	-5687.4	-686	-4771.1	1111.3
A B B India Ltd.	2012	3	678.6	-263.9	-134.7	657.5
A B B India Ltd.	2013	3	3450.1	26.3	1066.3	1654.9
A B B India Ltd.	2014	3	1010.7	-41.6	1379.2	41.8
A B B India Ltd.	2015	3	4309.7	88.4	-2329.2	6173.1
3I Infotech Ltd.	2011	1	4309.7	88.4	-2329.2	6173.1
3I Infotech Ltd.	2012	1	39.2	-636	-20	235.2
3I Infotech Ltd.	2013	1	1056.3	32.1	611.9	555.8
3I Infotech Ltd.	2014	1	887.9	56.9	992.2	3359.1
3I Infotech Ltd.	2015	1	-174.9	1.2	-519.6	232.9
3I Infotech Ltd.	2016	1	-828.7	542.1	-1084.5	2738.2

industry fixed effects is to be estimated based on industry code and time series fixed effects has to be estimated based on year

please suggest me SAS code

thanks in advance

xyz0505 · Posted 03-18-2020 09:33 AM

This is a system error, can you check your SAS and whether it has the ETS license? By the way I just noticed your industry id has duplicated time sequence. Proc Panel suppose you id time look like:

id	time
1	1
1	2
1	3
1	4
2	1
2	2
2	3
2	4
…	…

If you have duplicated time periods in one industry I suggest you use dummy directly. Also in proc panel the id statement requires cross section id comes first and then time id:

data testdat;
input company_name $ year industry_code y x1 x2 x3;
datalines;
20_Microns_Ltd 2011 1 956.5 69.8 610.1 542.7
20_Microns_Ltd 2012 1 82.3 5.1 142.8 281.8
20_Microns_Ltd 2013 1 -25.3 -38.2 -98.7 123
20_Microns_Ltd 2014 1 413.7 26.1 130.4 65.8
7Seas_Entertai 2011 2 -121.2 100.9 -96.5 88
7Seas_Entertai 2012 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2013 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2014 2 -2.8 -21.8 -104.8 41.4
7Seas_Entertai 2015 2 7871.9 1086.5 5092.9 5177.8
A_B_B_India_Ltd. 2011 3 -5687.4 -686 -4771.1 1111.3
A_B_B_India_Ltd. 2012 3 678.6 -263.9 -134.7 657.5
A_B_B_India_Ltd. 2013 3 3450.1 26.3 1066.3 1654.9
A_B_B_India_Ltd. 2014 3 1010.7 -41.6 1379.2 41.8
A_B_B_India_Ltd. 2015 3 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2011 1 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2012 1 39.2 -636 -20 235.2
3I_Infotech_Ltd. 2013 1 1056.3 32.1 611.9 555.8
3I_Infotech_Ltd. 2014 1 887.9 56.9 992.2 3359.1
3I_Infotech_Ltd. 2015 1 -174.9 1.2 -519.6 232.9
3I_Infotech_Ltd. 2016 1 -828.7 542.1 -1084.5 2738.2
run;
proc sort data=testdat;
by company_name year;
run;

proc panel data=testdat;
id company_name year;
class industry_code year;
model y = x1 x2 year industry_code/ pooled;
run;

View solution in original post

StatDave · Posted 03-05-2020 04:17 PM

If you can specify an assumed response distribution, then you can fit an appropriate model that takes account of the correlation among the observations within each company by using PROC GEE. You could fit a model that allows you to see if there are differences among the industries and differences (or trend) over time. For example, something like the following can test for industry differences and a linear trend over time, assuming that the response is normally distributed and an autoregressive correlation structure within companies. Note that validity of the GEE model requires a large number of clusters (level of SUBJECT=), so more would be needed than you show in your example data.

proc gee;
class company_name industry_code;
model y=industry_code year  x1-x3 / type3;
repeated subject=company_name / type=ar;
run;

xyz0505 · Posted 03-06-2020 10:52 AM

I'm not sure what you want, for cross section, you just want industry fix effect only or you want both industry and company effect???

For the first one you can do:

proc panel data=your_data_name;
id industry year;
model y = x1 x2 x3 / fixtwo;
run;

For the second one there's no built in model, you can add all dummies to the pooled model:

proc panel data=your_data_name;
id company_name year;

class company_name year industry;
model y = x1 x2 x3 company_name year industry/ pooled;
run;

srikanthyadav44 · Posted 03-09-2020 06:47 AM

Dear Mr.xyz0505
I want both industry and year fixed effects, but not company fixed effects.
When I use proc panel, I am getting error, because, industry code is repeated for more than one company in the same year.
It is becoming repeated measures panel data and I don't know how to handle it.
Please help me in this regard.

xyz0505 · Posted 03-11-2020 11:05 AM

The error indicate you may need to sort the data first.

proc sort data=your_data;
by industry year;
run;

PROC PANEL assumes you've already sorted data by cross section first then time.

srikanthyadav44 · Posted 03-18-2020 04:53 AM

dear xyz0505

greeting of the day

when i run the panel regression with the code suggested by you.

But, i got the following error.

proc panel data=Model_file;
35 model Y = X1- X13 / fixtwo;
36 id year Industry;
37 run;

ERROR: Unable to create an instance of the TKEPAN TK Extension.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PANEL used (Total process time):
real time 0.09 seconds
cpu time 0.00 seconds

I am not able to understand it .

please help me in this regard

thanks in advance

xyz0505 · Posted 03-18-2020 09:33 AM

This is a system error, can you check your SAS and whether it has the ETS license? By the way I just noticed your industry id has duplicated time sequence. Proc Panel suppose you id time look like:

id	time
1	1
1	2
1	3
1	4
2	1
2	2
2	3
2	4
…	…

If you have duplicated time periods in one industry I suggest you use dummy directly. Also in proc panel the id statement requires cross section id comes first and then time id:

data testdat;
input company_name $ year industry_code y x1 x2 x3;
datalines;
20_Microns_Ltd 2011 1 956.5 69.8 610.1 542.7
20_Microns_Ltd 2012 1 82.3 5.1 142.8 281.8
20_Microns_Ltd 2013 1 -25.3 -38.2 -98.7 123
20_Microns_Ltd 2014 1 413.7 26.1 130.4 65.8
7Seas_Entertai 2011 2 -121.2 100.9 -96.5 88
7Seas_Entertai 2012 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2013 2 -20.9 -80.7 18.6 246
7Seas_Entertai 2014 2 -2.8 -21.8 -104.8 41.4
7Seas_Entertai 2015 2 7871.9 1086.5 5092.9 5177.8
A_B_B_India_Ltd. 2011 3 -5687.4 -686 -4771.1 1111.3
A_B_B_India_Ltd. 2012 3 678.6 -263.9 -134.7 657.5
A_B_B_India_Ltd. 2013 3 3450.1 26.3 1066.3 1654.9
A_B_B_India_Ltd. 2014 3 1010.7 -41.6 1379.2 41.8
A_B_B_India_Ltd. 2015 3 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2011 1 4309.7 88.4 -2329.2 6173.1
3I_Infotech_Ltd. 2012 1 39.2 -636 -20 235.2
3I_Infotech_Ltd. 2013 1 1056.3 32.1 611.9 555.8
3I_Infotech_Ltd. 2014 1 887.9 56.9 992.2 3359.1
3I_Infotech_Ltd. 2015 1 -174.9 1.2 -519.6 232.9
3I_Infotech_Ltd. 2016 1 -828.7 542.1 -1084.5 2738.2
run;
proc sort data=testdat;
by company_name year;
run;

proc panel data=testdat;
id company_name year;
class industry_code year;
model y = x1 x2 year industry_code/ pooled;
run;

srikanthyadav44 · Posted 03-22-2020 08:11 AM

dear Mr. xyz0505
thank you for your wonderful suggestion to use dummy variables for the industries groups to capture the fixed effects.
As we know, we cannot introduce the dummy variables for all the groups of industries present in the sample, because it leads to dummy variable trap.
here, i have two alternative ways
1. select one industry group as reference group and not introduce dummy variable to that reference group; or
2. introduce all the dummy variables, but not include constant in the model,

can be please suggest me which alternative is better.
thanks in advance.

xyz0505 · Posted 03-23-2020 03:42 PM

PROC PANEL drop the redundant dummies for you.