Solved: Re: Calculating the score average of other firms in the same year

joon1 · Posted 02-25-2024 02:04 PM

Dear Madam/Sir,

I would like to calculate the score average of other firms in the same year. For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7.

Any programming help would be greatly appreciated.

gvkey1	year	score
1011	1988	0.1508
1011	1988	0.0851
1011	1988	0.0258
1011	1988	0.0097
1013	1988	0.0215
1013	1988	0.0154
1013	1988	0.0842
1013	1988	0.0006
1034	1988	0.0434
1034	1988	0.0193
1034	1988	0.0324
1036	1988	0.004
1036	1988	0.0393
1036	1988	0.0177
1036	1988	0.0103
1045	1988	0.0712
1045	1988	0.0059
1045	1988	0.0636
1045	1988	0.007
1045	1988	0.0619
1065	1988	0.0144
1065	1988	0.0201
1065	1988	0.0006
1065	1988	0.0101
1072	1988	0.0663
1073	1988	0.0021
1073	1988	0.0462
1073	1988	0.006

Thank you

Joon1

PaigeMiller · Posted 02-26-2024 06:32 AM

This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);

proc summary data=have;
    class year gvkey1;
    var score;
    types year year*gvkey1;
    output out=sums sum=sum n=n;
run;

data want;
    merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
        sums(where=(_type_=3) drop=_freq_);
    by year;
    avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 02-25-2024 03:09 PM

For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7.

Is that the right formula? Or do you mean this:

For instance, for gvkey1 # 1011, the average score = score sum (mean(gvkey # 1013), mean(gvkey1 # 1034), mean(gvkey1 # 1036),mean(gvkey1 # 1045),mean(gvkey1 # 1065),mean(gvkey1 # 1072),mean(gvkey1 # 1073)/7.

--
Paige Miller

joon1 · Posted 02-25-2024 05:12 PM

Thanks a lot for your prompt reply.

I meant for gvkey1 # 1011, the average score = score sum of other gvkeys (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/ 24 (the number of all other observations, not 7)

Thank you for your help

Sincerely,

Joon1

mkeintz · Posted 02-26-2024 12:20 AM

These program are untested.

If your data are sorted by year, then:

Each year is read twice, the first time to generate sum of score, and count of scores. The second time to calculate the desired mean.

data want (drop=_:);
  set have (in=firstpass)
      have (in=secondpass) ;
  by year;
  if first.year then call missing(_n,_sum);
  if firstpass then do;
    _n+1;
    _sum+score;
  end;
  if secondpass;
  if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;

But if the data are not sorted, you still have to read data twice, but not year by year. So the hash object is used to store the sum and count during the first pass, and the secondpass to retrieve _N and _SUM for the year-in-hand:

data want (drop=_:);
  set have (in=firstpass)
      have (in=secondpass) ;

  if _n_=1 then do;
    call missing(_n,_sum);
    declare hash h ();
      h.definekey('year');
      h.definedata('_n','_sum');
      h.definedone();
  end;
  if firstpass then do;
    if h.find()^=0 then call missing(_n,_sum);
    _n=sum(_n,1);
    _sum=sum(_sum,score);
    h.replace();
  end;
  if secondpass;
  h.find();
  if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 02-26-2024 12:21 AM

/*You want this ?*/
data have;
infile cards expandtabs truncover;
input gvkey1	year	score;
cards;
1011	1988	0.1508
1011	1988	0.0851
1011	1988	0.0258
1011	1988	0.0097
1013	1988	0.0215
1013	1988	0.0154
1013	1988	0.0842
1013	1988	0.0006
1034	1988	0.0434
1034	1988	0.0193
1034	1988	0.0324
1036	1988	0.004
1036	1988	0.0393
1036	1988	0.0177
1036	1988	0.0103
1045	1988	0.0712
1045	1988	0.0059
1045	1988	0.0636
1045	1988	0.007
1045	1988	0.0619
1065	1988	0.0144
1065	1988	0.0201
1065	1988	0.0006
1065	1988	0.0101
1072	1988	0.0663
1073	1988	0.0021
1073	1988	0.0462
1073	1988	0.006
;


proc sql;
create table want as
select *,(select mean(score) from have where year=a.year and gvkey1 ne a.gvkey1) as average_score
 from have as a;
quit;

PaigeMiller · Posted 02-26-2024 06:32 AM

This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);

proc summary data=have;
    class year gvkey1;
    var score;
    types year year*gvkey1;
    output out=sums sum=sum n=n;
run;

data want;
    merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
        sums(where=(_type_=3) drop=_freq_);
    by year;
    avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;

--
Paige Miller

joon1 · Posted 02-26-2024 01:47 PM

Thank you so much for your help, PaigeMiller. Have a great week.

Joon1

joon1 · Posted 02-26-2024 01:48 PM

Thank you for your help, Ksharp. Have a great week.

SAS Innovate 2025: Call for Content

Classroom Training Available!