BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
joon1
Quartz | Level 8

Dear Madam/Sir,

 

I would like to calculate the score average of other firms in the same year. For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7. 

Any programming help would be greatly appreciated.

 

gvkey1yearscore
101119880.1508
101119880.0851
101119880.0258
101119880.0097
101319880.0215
101319880.0154
101319880.0842
101319880.0006
103419880.0434
103419880.0193
103419880.0324
103619880.004
103619880.0393
103619880.0177
103619880.0103
104519880.0712
104519880.0059
104519880.0636
104519880.007
104519880.0619
106519880.0144
106519880.0201
106519880.0006
106519880.0101
107219880.0663
107319880.0021
107319880.0462
107319880.006

 

Thank you

Joon1

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);

 

proc summary data=have;
    class year gvkey1;
    var score;
    types year year*gvkey1;
    output out=sums sum=sum n=n;
run;

data want;
    merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
        sums(where=(_type_=3) drop=_freq_);
    by year;
    avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;
--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

For instance, for gvkey1 # 1011, the average score = score sum (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/7.

 

Is that the right formula? Or do you mean this:

 

For instance, for gvkey1 # 1011, the average score = score sum (mean(gvkey # 1013), mean(gvkey1 # 1034), mean(gvkey1 # 1036),mean(gvkey1 # 1045),mean(gvkey1 # 1065),mean(gvkey1 # 1072),mean(gvkey1 # 1073)/7.

--
Paige Miller
joon1
Quartz | Level 8

Thanks a lot for your prompt reply. 

I meant  for gvkey1 # 1011, the average score = score sum of other gvkeys (gvkey # 1013, 1034,1036,1045,1065,1072,1073)/ 24 (the number of all other observations, not 7)

Thank you for your help

Sincerely,

Joon1

 

mkeintz
PROC Star

These program are untested. 

 

If your data are sorted by year, then:

 

 

Each year is read twice, the first time to generate sum of score, and count of scores.  The second time to calculate the desired mean.

data want (drop=_:);
  set have (in=firstpass)
      have (in=secondpass) ;
  by year;
  if first.year then call missing(_n,_sum);
  if firstpass then do;
    _n+1;
    _sum+score;
  end;
  if secondpass;
  if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;

 

But if the data are not sorted, you still have to read data twice, but not year by year.  So the hash object is used to store the sum and count during the first pass, and the secondpass to retrieve _N and _SUM for the year-in-hand:

 

data want (drop=_:);
  set have (in=firstpass)
      have (in=secondpass) ;

  if _n_=1 then do;
    call missing(_n,_sum);
    declare hash h ();
      h.definekey('year');
      h.definedata('_n','_sum');
      h.definedone();
  end;
  if firstpass then do;
    if h.find()^=0 then call missing(_n,_sum);
    _n=sum(_n,1);
    _sum=sum(_sum,score);
    h.replace();
  end;
  if secondpass;
  h.find();
  if _n>1 then mean_of_other_gvkeys=(_sum-score)/(_n-1);
run;

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
/*You want this ?*/
data have;
infile cards expandtabs truncover;
input gvkey1	year	score;
cards;
1011	1988	0.1508
1011	1988	0.0851
1011	1988	0.0258
1011	1988	0.0097
1013	1988	0.0215
1013	1988	0.0154
1013	1988	0.0842
1013	1988	0.0006
1034	1988	0.0434
1034	1988	0.0193
1034	1988	0.0324
1036	1988	0.004
1036	1988	0.0393
1036	1988	0.0177
1036	1988	0.0103
1045	1988	0.0712
1045	1988	0.0059
1045	1988	0.0636
1045	1988	0.007
1045	1988	0.0619
1065	1988	0.0144
1065	1988	0.0201
1065	1988	0.0006
1065	1988	0.0101
1072	1988	0.0663
1073	1988	0.0021
1073	1988	0.0462
1073	1988	0.006
;


proc sql;
create table want as
select *,(select mean(score) from have where year=a.year and gvkey1 ne a.gvkey1) as average_score
 from have as a;
quit;
PaigeMiller
Diamond | Level 26

This is just math. The average you want is the (sum score of all - sum score of this GVKEY1)/(n of all - n of this GVKEY1);

 

proc summary data=have;
    class year gvkey1;
    var score;
    types year year*gvkey1;
    output out=sums sum=sum n=n;
run;

data want;
    merge sums(where=(_type_=2) drop=_freq_ rename=(sum=total_sum n=total_n))
        sums(where=(_type_=3) drop=_freq_);
    by year;
    avg_of_other_groups=(total_sum-sum)/(total_n-n);
run;
--
Paige Miller
joon1
Quartz | Level 8

Thank you so much for your help, PaigeMiller. Have a great week.

Joon1

joon1
Quartz | Level 8
Thank you for your help, Ksharp. Have a great week.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 506 views
  • 0 likes
  • 4 in conversation