DATA Step, Macro, Functions and more

Compute mean by Industry excluding ith firm

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 102
Accepted Solution

Compute mean by Industry excluding ith firm

Dear all,

this might be an easy question for those experienced programmers.

thanks in advance for your consideration!

a small sample (real data is huge in size)  is given below. I want to create two variables:  (1) industry_id : if firms share the same SIC (based on first three digits), it belongs to the same industry. in my example, we would have industry_id=1 for sic 3100, 3101 ; industry_id=2 for sic 3300, 3303 ; (2) compute industry mean return based on industry_id and year, but such mean must exclude ith firm,

for example, firm id 1001, 1002, 1003 belong to industry_id=1 , for year 1990, firmid 1001, mean_ret =(.7+.2)/2=.45,

for year 1991, firmid 1001, mean_ret =(-0.5+.4)/2=-.05

please help

firm_idSICYearReturn
1001310019900.01
1001310019910.2
1001310019920.3
1001310019930.7
100131001994-0.5
1001310019950.2
1001310019960.4
1002310119900.7
100231011991-0.5
1002310119920.2
1002310119930.4
1002310119940.4
1002310119950.2
1002310119960.2
1003310119900.2
1003310119910.4
1003310119920.7
100331011993-0.5
1003310119940.2
1003310119950.2
1003310119960.2
100633001990-0.5
1006330019910.9
1006330019920.2
1006330019930.09
1006330019940.09
1006330019950.22
1006330019960.1
1004330319900.4
1004330319910.2
1004330319920.2
100433031993-0.5
1004330319940.9
1004330319950.22
1004330319960.1

-Lan


Accepted Solutions
Solution
‎11-02-2013 04:24 PM
Super User
Posts: 19,791

Re: Compute mean by Industry excluding ith firm

A modified self join should work.


data have;

    input firm_id    SIC    Year    Return;

    id=substr(put(sic,4.),1,3);

cards;

1001    3100    1990    0.01

1001    3100    1991    0.2

1001    3100    1992    0.3

1001    3100    1993    0.7

1001    3100    1994    -0.5

1001    3100    1995    0.2

1001    3100    1996    0.4

1002    3101    1990    0.7

1002    3101    1991    -0.5

1002    3101    1992    0.2

1002    3101    1993    0.4

1002    3101    1994    0.4

1002    3101    1995    0.2

1002    3101    1996    0.2

1003    3101    1990    0.2

1003    3101    1991    0.4

1003    3101    1992    0.7

1003    3101    1993    -0.5

1003    3101    1994    0.2

1003    3101    1995    0.2

1003    3101    1996    0.2

1006    3300    1990    -0.5

1006    3300    1991    0.9

1006    3300    1992    0.2

1006    3300    1993    0.09

1006    3300    1994    0.09

1006    3300    1995    0.22

1006    3300    1996    0.1

1004    3303    1990    0.4

1004    3303    1991    0.2

1004    3303    1992    0.2

1004    3303    1993    -0.5

1004    3303    1994    0.9

1004    3303    1995    0.22

1004    3303    1996    0.1

;

run;

PROC SQL;

  CREATE TABLE WANT AS

  SELECT A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN, AVG(B.RETURN) AS AVG_RETURN

  FROM HAVE AS A

  LEFT JOIN HAVE AS B

  ON A.ID=B.ID

  AND A.YEAR=B.YEAR

  AND A.FIRM_ID NE B.FIRM_ID

  GROUP BY A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN;

QUIT;

View solution in original post


All Replies
Respected Advisor
Posts: 4,173

Re: Compute mean by Industry excluding ith firm

I haven't understood what you meant by "but such mean must exclude ith firm" and haven't seen something like that in your example - so this is not covered in below (untested) SQL.

For the industry_id: You say that this are the first 3 digits of the SIC. If so the I believe it's best to create an industry_id containing exactly that - the first 3 digits of SIC.

proc sql;

create table want as

   select

    firm_id,

    SIC,

    Year,

    Return,

    substrn(SIC,1,3) as industry_id,

    avg(return) as mean_ret

   from have

   group by substrn(SIC,1,3), year

   ;

quit;

Frequent Contributor
Posts: 102

Re: Compute mean by Industry excluding ith firm

thank you, Patrick!

"but such mean must exclude ith firm" what I am asking here is that when I compute mean return by industry for firm i, we will use on firm i's peers return divided by the industry size minus 1 (i.e. excluding i).

how do I exclude firm i ?

-Lan

Trusted Advisor
Posts: 1,137

Re: Compute mean by Industry excluding ith firm

alternatively by datastep,

data have;

    input firm_id    SIC$    Year    Return;

    id=substr(sic,1,3);

cards;

1001    3100    1990    0.01

1001    3100    1991    0.2

1001    3100    1992    0.3

1001    3100    1993    0.7

1001    3100    1994    -0.5

1001    3100    1995    0.2

1001    3100    1996    0.4

1002    3101    1990    0.7

1002    3101    1991    -0.5

1002    3101    1992    0.2

1002    3101    1993    0.4

1002    3101    1994    0.4

1002    3101    1995    0.2

1002    3101    1996    0.2

1003    3101    1990    0.2

1003    3101    1991    0.4

1003    3101    1992    0.7

1003    3101    1993    -0.5

1003    3101    1994    0.2

1003    3101    1995    0.2

1003    3101    1996    0.2

1006    3300    1990    -0.5

1006    3300    1991    0.9

1006    3300    1992    0.2

1006    3300    1993    0.09

1006    3300    1994    0.09

1006    3300    1995    0.22

1006    3300    1996    0.1

1004    3303    1990    0.4

1004    3303    1991    0.2

1004    3303    1992    0.2

1004    3303    1993    -0.5

1004    3303    1994    0.9

1004    3303    1995    0.22

1004    3303    1996    0.1

;

run;

proc sort data = have;

    by id year;

run;

proc means data=have mean nway;

class id year;

var return;

output out=means(drop=_type_ _freq_) mean= /autoname;

run;

data  want;

    merge means have;

    by id year;

run;

proc sort data=want;

    by firm_id    SIC;

run;

Thanks,

Jagadish

Thanks,
Jag
Trusted Advisor
Posts: 1,137

Re: Compute mean by Industry excluding ith firm

Posted in reply to Jagadishkatam

Hi Lan,

i saw your post to Patrick after i submitted the above code, So accordingly i modified the my code. Please try

data have;

    input firm_id    SIC    Year    Return;

    id=substr(put(sic,4.),1,3);

cards;

1001    3100    1990    0.01

1001    3100    1991    0.2

1001    3100    1992    0.3

1001    3100    1993    0.7

1001    3100    1994    -0.5

1001    3100    1995    0.2

1001    3100    1996    0.4

1002    3101    1990    0.7

1002    3101    1991    -0.5

1002    3101    1992    0.2

1002    3101    1993    0.4

1002    3101    1994    0.4

1002    3101    1995    0.2

1002    3101    1996    0.2

1003    3101    1990    0.2

1003    3101    1991    0.4

1003    3101    1992    0.7

1003    3101    1993    -0.5

1003    3101    1994    0.2

1003    3101    1995    0.2

1003    3101    1996    0.2

1006    3300    1990    -0.5

1006    3300    1991    0.9

1006    3300    1992    0.2

1006    3300    1993    0.09

1006    3300    1994    0.09

1006    3300    1995    0.22

1006    3300    1996    0.1

1004    3303    1990    0.4

1004    3303    1991    0.2

1004    3303    1992    0.2

1004    3303    1993    -0.5

1004    3303    1994    0.9

1004    3303    1995    0.22

1004    3303    1996    0.1

;

run;

proc sort data = have;

    by year firm_id ;

run;

data have_;

    set have;

    by     year firm_id ;

    if first.year then flag=1;

run;

proc means data=have_(where=(flag^=1)) mean nway;

class id year;

var return;

output out=means(drop=_type_ _freq_) mean= /autoname;

run;

proc sort data = have;

    by id year  ;

run;

data  want;

    merge means have;

    by id year;

run;

proc sort data=want;

    by firm_id    SIC;

run;

Thanks,

Jagadish

Thanks,
Jag
Frequent Contributor
Posts: 102

Re: Compute mean by Industry excluding ith firm

Posted in reply to Jagadishkatam

Jagadish,

thanks so much ! !

I should have provided an expected output, what I need for the mean return is going to be each firm-industry-year (see example below), it is my mistake not to make it clear-that is why your code produces  mean return per industry-year

want: i computed 2 of them as illustration

the first one: 0.45 = [0.7 (from firm id 1002, year 1990)+0.2(from firm id 1003, year 1990) ] / 2=.45

firm_idSICYearReturnIndustry_IDMean_return
1001310019900.013100.45
1001310019910.2310-0.05
1001310019920.3310
1001310019930.7310
100131001994-0.5310
1001310019950.2310
1001310019960.4310
1002310119900.7310
100231011991-0.5310
1002310119920.2310
1002310119930.4310
1002310119940.4310
1002310119950.2310
1002310119960.2310
1003310119900.2310
1003310119910.4310
1003310119920.7310
100331011993-0.5310
1003310119940.2310
1003310119950.2310
1003310119960.2310
100633001990-0.5330
1006330019910.9330
1006330019920.2330
1006330019930.09330
1006330019940.09330
1006330019950.22330
1006330019960.1330
1004330319900.4330
1004330319910.2330
1004330319920.2330
100433031993-0.5330
1004330319940.9330
1004330319950.22330
1004330319960.1330

could you help modify your code?

-lan

Solution
‎11-02-2013 04:24 PM
Super User
Posts: 19,791

Re: Compute mean by Industry excluding ith firm

A modified self join should work.


data have;

    input firm_id    SIC    Year    Return;

    id=substr(put(sic,4.),1,3);

cards;

1001    3100    1990    0.01

1001    3100    1991    0.2

1001    3100    1992    0.3

1001    3100    1993    0.7

1001    3100    1994    -0.5

1001    3100    1995    0.2

1001    3100    1996    0.4

1002    3101    1990    0.7

1002    3101    1991    -0.5

1002    3101    1992    0.2

1002    3101    1993    0.4

1002    3101    1994    0.4

1002    3101    1995    0.2

1002    3101    1996    0.2

1003    3101    1990    0.2

1003    3101    1991    0.4

1003    3101    1992    0.7

1003    3101    1993    -0.5

1003    3101    1994    0.2

1003    3101    1995    0.2

1003    3101    1996    0.2

1006    3300    1990    -0.5

1006    3300    1991    0.9

1006    3300    1992    0.2

1006    3300    1993    0.09

1006    3300    1994    0.09

1006    3300    1995    0.22

1006    3300    1996    0.1

1004    3303    1990    0.4

1004    3303    1991    0.2

1004    3303    1992    0.2

1004    3303    1993    -0.5

1004    3303    1994    0.9

1004    3303    1995    0.22

1004    3303    1996    0.1

;

run;

PROC SQL;

  CREATE TABLE WANT AS

  SELECT A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN, AVG(B.RETURN) AS AVG_RETURN

  FROM HAVE AS A

  LEFT JOIN HAVE AS B

  ON A.ID=B.ID

  AND A.YEAR=B.YEAR

  AND A.FIRM_ID NE B.FIRM_ID

  GROUP BY A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN;

QUIT;

Frequent Contributor
Posts: 102

Re: Compute mean by Industry excluding ith firm

Many thanks to Reeza, Jagadishkatam, and Patrick !!!

-lan

Trusted Advisor
Posts: 1,137

Re: Compute mean by Industry excluding ith firm

Hi Lan,

I left my answer incomplete in my earlier post, please add the below code to my previous code and you will get the desired output.

proc sort data=want;

    by year firm_id;

run;

data want_;

    set want;

    by year firm_id;

    if first.year then Return_Mean=Return_Mean;

    else Return_Mean=.;

run;

Thanks,

Jagadish

Thanks,
Jag
Frequent Contributor
Posts: 102

Re: Compute mean by Industry excluding ith firm

Posted in reply to Jagadishkatam

Thank you, Jagadish. I will try it.

Frequent Contributor
Posts: 102

Re: Compute mean by Industry excluding ith firm

Reeza,

I used your code for my real data which contains  2717655 observations. I got "ERROR SORT EXECUTION FAILURE" my guess is sas ran out of memory. my thought is to break the data into smaller sets by industry id, then run your code, after that, merge all the data back into a big dataset.

Do you have an efficient way of doing this?

this is  the part of the code I got error - my guess is sas ran out of memory, it look over an hour for sas to run.

* note: id=substr(put(sic,4.),1,3);

PROC SQL;

  CREATE TABLE OUT AS

  SELECT A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN, AVG(B.RETURN) AS AVG_RETURN

  FROM HAVE AS A

  LEFT JOIN HAVE AS B

  ON A.ID=B.ID

  AND A.YEAR=B.YEAR

  AND A.FIRM_ID NE B.FIRM_ID

  GROUP BY A.FIRM_ID, A.SIC, A.ID, A.YEAR, A.RETURN;

QUIT;

thanks very much,

-Lan

Respected Advisor
Posts: 3,799

Re: Compute mean by Industry excluding ith firm

Isyour input data is not sorted by or index on any of the key variables.  What is the data type of SIC?  Do you have a lot of extra variable in the input data that you did not mention?

Super Contributor
Super Contributor
Posts: 444

Re: Compute mean by Industry excluding ith firm

hi guys,

i have a question on this problem.

So if i sort the dataset given above  by indt_id year firm_id to get what i have shown here and run the following code i am not getting the right avg

Whats  wrong with my code ? Smiley Happy

data sorted;

set have;

by indt_id year firm_id;

count=0;

if not first.year then do;

count=count+1;

sum+return;end;

if last.year then do;

avg=sum/count;output; end;

proc print;run;

                                            Obs    firm_id    SIC     Year    Return    Indt_id

                                              1     1001      3100    1990      0.01      310

                                              2     1002      3101    1990      0.70      310

                                              3     1003      3101    1990      0.20      310

                                              4     1001      3100    1991      0.20      310

                                              5     1002      3101    1991     -0.50      310

                                              6     1003      3101    1991      0.40      310

                                              7     1001      3100    1992      0.30      310

                                              8     1002      3101    1992      0.20      310

                                              9     1003      3101    1992      0.70      310

                                             10     1001      3100    1993      0.70      310

                                             11     1002      3101    1993      0.40      310

                                             12     1003      3101    1993     -0.50      310

                                             13     1001      3100    1994     -0.50      310

                                             14     1002      3101    1994      0.40      310

                                             15     1003      3101    1994      0.20      310

                                             16     1001      3100    1995      0.20      310

                                             17     1002      3101    1995      0.20      310

                                             18     1003      3101    1995      0.20      310

                                             19     1001      3100    1996      0.40      310

                                             20     1002      3101    1996      0.20      310

                                             21     1003      3101    1996      0.20      310

                                             22     1004      3303    1990      0.40      330

                                             23     1006      3300    1990     -0.50      330

                                             24     1004      3303    1991      0.20      330

                                             25     1006      3300    1991      0.90      330

                                             26     1004      3303    1992      0.20      330

                                             27     1006      3300    1992      0.20      330

                                             28     1004      3303    1993     -0.50      330

                                             29     1006      3300    1993      0.09      330

                                             30     1004      3303    1994      0.90      330

                                             31     1006      3300    1994      0.09      330

                                             32     1004      3303    1995      0.22      330

                                             33     1006      3300    1995      0.22      330

                                             34     1004      3303    1996      0.10      330

                                             35     1006      3300    1996      0.10      330

Respected Advisor
Posts: 3,799

Re: Compute mean by Industry excluding ith firm

If you can create SIC that groups properly $3 (SIC2) and sort the data and make a reasonable guess at the maximum number of FIRMS per YEAR this data step should work with your actual data.  I don't know about performance.

data firm;
   input firm:$4. SIC:$4. Year Return;
   length sic2 $3;
   sic2 = sic;
  
cards;
1001    3100    1990    0.01
1001    3100    1991    0.2
1001    3100    1992    0.3
1001    3100    1993    0.7
1001    3100    1994    -0.5
1001    3100    1995    0.2
1001    3100    1996    0.4
1002    3101    1990    0.7
1002    3101    1991    -0.5
1002    3101    1992    0.2
1002    3101    1993    0.4
1002    3101    1994    0.4
1002    3101    1995    0.2
1002    3101    1996    0.2
1003    3101    1990    0.2
1003    3101    1991    0.4
1003    3101    1992    0.7
1003    3101    1993    -0.5
1003    3101    1994    0.2
1003    3101    1995    0.2
1003    3101    1996    0.2
1006    3300    1990    -0.5
1006    3300    1991    0.9
1006    3300    1992    0.2
1006    3300    1993    0.09
1006    3300    1994    0.09
1006    3300    1995    0.22
1006    3300    1996    0.1
1004    3303    1990    0.4
1004    3303    1991    0.2
1004    3303    1992    0.2
1004    3303    1993    -0.5
1004    3303    1994    0.9
1004    3303    1995    0.22
1004    3303    1996    0.1
;
run;
proc sort;
  
by sic2 year firm;
   run;
proc print;
  
run;

%let d=5; *max firms per year;
data mean;
   i = 0;
  
do until(last.year);
      set firm;
      by sic2 year;
      array s[&d] $4 _temporary_;
     
array f[&d] $4 _temporary_;
     
array r[&d] 8  _temporary_;
      i +
1;
      s = sic;
      f = firm;
      r = return;
     
end;
  
do i = 1 to i;
      return = r;
      r   = .;
      sic    = s;
      firm   = f;
      mean   = mean(of r
  • );
         
  • output;
          r   = return;
         
    end;
      
    call missing(of s
  • f
  • r
  • );
  •    drop i;
       run;
    proc print;
      
    run;
         

    Message was edited by: data _null_ I forgot to add CALL MISSING when I switched to temporary arrays.

    Super Contributor
    Super Contributor
    Posts: 444

    Re: Compute mean by Industry excluding ith firm

    Posted in reply to data_null__

    Thank you sir.but if i want to use dows without arrays after sorting the set by sic2 year firm_id and running the dow code to get the mean for which by “ sic2 year firm_id” group excluding the 1st record of each group i don’t get the expected results.

    Count and sum don’t get to missing when a new “by sic2 year firm_id” group starts, why?

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 21 replies
    • 1051 views
    • 7 likes
    • 8 in conversation