Dear Madam/Sir,
I would like to calculate the score average of other firms in the same year. For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7.
Any programming help would be greatly appreciated.
gvkey1 | year | score |
1011 | 1988 | 0.1508 |
1011 | 1988 | 0.0851 |
1011 | 1988 | 0.0258 |
1011 | 1988 | 0.0097 |
1013 | 1988 | 0.0215 |
1013 | 1988 | 0.0154 |
1013 | 1988 | 0.0842 |
1013 | 1988 | 0.0006 |
1034 | 1988 | 0.0434 |
1034 | 1988 | 0.0193 |
1034 | 1988 | 0.0324 |
1036 | 1988 | 0.004 |
1036 | 1988 | 0.0393 |
1036 | 1988 | 0.0177 |
1036 | 1988 | 0.0103 |
1045 | 1988 | 0.0712 |
1045 | 1988 | 0.0059 |
1045 | 1988 | 0.0636 |
1045 | 1988 | 0.007 |
1045 | 1988 | 0.0619 |
1065 | 1988 | 0.0144 |
1065 | 1988 | 0.0201 |
1065 | 1988 | 0.0006 |
1065 | 1988 | 0.0101 |
1072 | 1988 | 0.0663 |
1073 | 1988 | 0.0021 |
1073 | 1988 | 0.0462 |
1073 | 1988 | 0.006 |
Thank you
Joon1
This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);
proc summary data=have;
class year gvkey1;
var score;
types year year*gvkey1;
output out=sums sum=sum n=n;
run;
data want;
merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
sums(where=(_type_=3) drop=_freq_);
by year;
avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;
For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7.
Is that the right formula? Or do you mean this:
For instance, for gvkey1 # 1011, the average score = score sum (mean(gvkey # 1013), mean(gvkey1 # 1034), mean(gvkey1 # 1036),mean(gvkey1 # 1045),mean(gvkey1 # 1065),mean(gvkey1 # 1072),mean(gvkey1 # 1073)/7.
Thanks a lot for your prompt reply.
I meant for gvkey1 # 1011, the average score = score sum of other gvkeys (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/ 24 (the number of all other observations, not 7)
Thank you for your help
Sincerely,
Joon1
These program are untested.
If your data are sorted by year, then:
Each year is read twice, the first time to generate sum of score, and count of scores. The second time to calculate the desired mean.
data want (drop=_:);
set have (in=firstpass)
have (in=secondpass) ;
by year;
if first.year then call missing(_n,_sum);
if firstpass then do;
_n+1;
_sum+score;
end;
if secondpass;
if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;
But if the data are not sorted, you still have to read data twice, but not year by year. So the hash object is used to store the sum and count during the first pass, and the secondpass to retrieve _N and _SUM for the year-in-hand:
data want (drop=_:);
set have (in=firstpass)
have (in=secondpass) ;
if _n_=1 then do;
call missing(_n,_sum);
declare hash h ();
h.definekey('year');
h.definedata('_n','_sum');
h.definedone();
end;
if firstpass then do;
if h.find()^=0 then call missing(_n,_sum);
_n=sum(_n,1);
_sum=sum(_sum,score);
h.replace();
end;
if secondpass;
h.find();
if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;
/*You want this ?*/
data have;
infile cards expandtabs truncover;
input gvkey1 year score;
cards;
1011 1988 0.1508
1011 1988 0.0851
1011 1988 0.0258
1011 1988 0.0097
1013 1988 0.0215
1013 1988 0.0154
1013 1988 0.0842
1013 1988 0.0006
1034 1988 0.0434
1034 1988 0.0193
1034 1988 0.0324
1036 1988 0.004
1036 1988 0.0393
1036 1988 0.0177
1036 1988 0.0103
1045 1988 0.0712
1045 1988 0.0059
1045 1988 0.0636
1045 1988 0.007
1045 1988 0.0619
1065 1988 0.0144
1065 1988 0.0201
1065 1988 0.0006
1065 1988 0.0101
1072 1988 0.0663
1073 1988 0.0021
1073 1988 0.0462
1073 1988 0.006
;
proc sql;
create table want as
select *,(select mean(score) from have where year=a.year and gvkey1 ne a.gvkey1) as average_score
from have as a;
quit;
This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);
proc summary data=have;
class year gvkey1;
var score;
types year year*gvkey1;
output out=sums sum=sum n=n;
run;
data want;
merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
sums(where=(_type_=3) drop=_freq_);
by year;
avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;
Thank you so much for your help, PaigeMiller. Have a great week.
Joon1
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.