## proc means by group

# proc means by group

Hi,

I am using SAS 9.2 and would like to calculate the mean of variable x by the variable id. For some reason, I am creating an extra observation with the code below, so would appreciate knowing why and how to fix it.

Thanks,

Brent Fulton

UC Berkeley

data1: N=1000, unique id=300. id is a numeric and none is missing. No x is missing.

data2: N=301, where one id is missing, that is: .

My problem is that data2 should have 300 observations.

proc means data=data1 noprint;

class id;

var x;

output out=data2 mean=x_mean;

run;

## Re: proc means by group

proc means data=data1 noprint nway ;

class id;

var x;

output out=data2 mean=x_mean;

run;

The  extra observation you create is mean of all observations you have. You can see that in variable _type_ which shows different levels of calculations.

## proc means by group

ieva's approach would get rid of the grand mean, but the missing is still a valid value.  To delete that, do it in the data clause of the PROC MEANS:

proc means data=data1(WHERE=(id ^= .)) noprint nway ;

class id;

var x;

output out=data2 mean=x_mean;

run;

## proc means by group

This was a helpful answer. My data doesn't have a missing id, but I'm likely to have a dataset that does in the future.

## Re: proc means by group

Hi ... the default behavior of PROC MEANS (and SUMMARY) is to ignore missing values for variables in a CLASS statement (both numeric and character).  So, a WHERE data set option is not needed to leave out observations with missing IDs.  You have to be proactive to have missing CLASS values included ...

data x;

input id x @@;

datalines;

1 10 1 20 2 20 2 30 . 40 . 50

;

run;

proc means data=x noprint nway;

class id;

var x;

output out=y mean=x_mean;

run;

proc means data=x noprint nway missing ;

class id;

var x;

output out=z mean=x_mean;

run;

DATA SET Y: NO MISSING OPTION

id    _TYPE_    _FREQ_    x_mean

1       1         2        15

2       1         2        25

DATA SET Z: MISSING OPTION ADDED

id    _TYPE_    _FREQ_    x_mean

.       1         2        45

1       1         2        15

2       1         2        25

ps  If the object is to produce a data set, why not just use SUMMARY where NOPRINT is the default  ...

proc summary data=x nway;

class id;

var x;

output out=y mean=x_mean;

run;

