Obsidian | Level 7

## How do I calculate average by grouping data ?

data compustat;
input firm sales year;

datalines;
A 2400 2001
A 2500 2002
A 2600 2003
A 2700 2004
A 2800 2005
B 1500 2001
B 1600 2002
B 1700 2003
C 3100 2001
C 3200 2002
C 3300 2003
C 3400 2004
;

I want a column with average for each group of firm for every sales. can someone please help with that ? here as an example of the output.

 firm sales year Average sales A 2400 2001 2600 A 2500 2002 2600 A 2600 2003 2600 A 2700 2004 2600 A 2800 2005 2600 B 1500 2001 1600 B 1600 2002 1600 B 1700 2003 1600 C 3100 2001 3250 C 3200 2002 3250 C 3300 2003 3250 C 3400 2004 3250
1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: How do I calculate average by grouping data ?

``````data compustat;
input firm \$ sales year;

datalines;
A 2400 2001
A 2500 2002
A 2600 2003
A 2700 2004
A 2800 2005
B 1500 2001
B 1600 2002
B 1700 2003
C 3100 2001
C 3200 2002
C 3300 2003
C 3400 2004
;

proc sql;
create table want as
select *, avg(sales) as average_sales
from compustat
group by firm
order by firm,year;
quit;
``````
6 REPLIES 6
Tourmaline | Level 20

## Re: How do I calculate average by grouping data ?

``````data compustat;
input firm \$ sales year;

datalines;
A 2400 2001
A 2500 2002
A 2600 2003
A 2700 2004
A 2800 2005
B 1500 2001
B 1600 2002
B 1700 2003
C 3100 2001
C 3200 2002
C 3300 2003
C 3400 2004
;

proc sql;
create table want as
select *, avg(sales) as average_sales
from compustat
group by firm
order by firm,year;
quit;
``````
Calcite | Level 5

## Re: How do I calculate average by grouping data ?

One follow-up question...how would the code change if you only wanted to group by Firm and only show the average for each firm grouping once?  See the end result is below.

Firm    average_sales

A         2600

B         1600

C         3250

PROC Star

## Re: How do I calculate average by grouping data ?

@neveragain wrote:

One follow-up question...how would the code change if you only wanted to group by Firm and only show the average for each firm grouping once?  See the end result is below.

Firm    average_sales

A         2600

B         1600

C         3250

Then remove implicit or explicit references to year and/or sales in the result table by changing

``select *, avg(sales) as average_sales``

to

``select firm, avg(sales) as average_sales``

and drop the

``order by firm,year``

clause.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Calcite | Level 5

## Re: How do I calculate average by grouping data ?

Excellent! Thank you for the help.  I just thought of another question...what if I wanted to derive the standard deviation or another calculation?  End result below:

Firm    average_sales               standard deviation

A         2600                            158

B         1600                            100

C         3250                            129

PROC Star

## Re: How do I calculate average by grouping data ?

"One followup question"

...

"I just thought of another question"

This is not a good habit for a person with the userid @neveragain .

I think it's time to start researching the documentation on functions, which you can apply in a similar way to @novinosrin 's use of the AVG function.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Diamond | Level 26

## Re: How do I calculate average by grouping data ?

``````proc summary data=compustat nway;
class firm;
var sales;
output out=_stats_ mean=average stddev=std;
run;
data want;
merge compustat _stats_;
by firm;
run;``````

The data set _stats_ that is created is the one you asked for with one record per firm.

--
Paige Miller
Discussion stats
• 6 replies
• 18393 views
• 1 like
• 5 in conversation