Hi,
I am using SAS 9.2 and would like to calculate the mean of variable x by the variable id. For some reason, I am creating an extra observation with the code below, so would appreciate knowing why and how to fix it.
Thanks,
Brent Fulton
UC Berkeley
data1: N=1000, unique id=300. id is a numeric and none is missing. No x is missing.
data2: N=301, where one id is missing, that is: .
My problem is that data2 should have 300 observations.
proc means data=data1 noprint;
class id;
var x;
output out=data2 mean=x_mean;
run;
You should add nway option:
proc means data=data1 noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
The extra observation you create is mean of all observations you have. You can see that in variable _type_ which shows different levels of calculations.
ieva's approach would get rid of the grand mean, but the missing is still a valid value. To delete that, do it in the data clause of the PROC MEANS:
proc means data=data1(WHERE=(id ^= .)) noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
This was a helpful answer. My data doesn't have a missing id, but I'm likely to have a dataset that does in the future.
Hi ... the default behavior of PROC MEANS (and SUMMARY) is to ignore missing values for variables in a CLASS statement (both numeric and character). So, a WHERE data set option is not needed to leave out observations with missing IDs. You have to be proactive to have missing CLASS values included ...
data x;
input id x @@;
datalines;
1 10 1 20 2 20 2 30 . 40 . 50
;
run;
proc means data=x noprint nway;
class id;
var x;
output out=y mean=x_mean;
run;
proc means data=x noprint nway missing ;
class id;
var x;
output out=z mean=x_mean;
run;
DATA SET Y: NO MISSING OPTION
id _TYPE_ _FREQ_ x_mean
1 1 2 15
2 1 2 25
DATA SET Z: MISSING OPTION ADDED
id _TYPE_ _FREQ_ x_mean
. 1 2 45
1 1 2 15
2 1 2 25
ps If the object is to produce a data set, why not just use SUMMARY where NOPRINT is the default ...
proc summary data=x nway;
class id;
var x;
output out=y mean=x_mean;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.