Hi,
I am using SAS 9.2 and would like to calculate the mean of variable x by the variable id. For some reason, I am creating an extra observation with the code below, so would appreciate knowing why and how to fix it.
Thanks,
Brent Fulton
UC Berkeley
data1: N=1000, unique id=300. id is a numeric and none is missing. No x is missing.
data2: N=301, where one id is missing, that is: .
My problem is that data2 should have 300 observations.
proc means data=data1 noprint;
class id;
var x;
output out=data2 mean=x_mean;
run;
You should add nway option:
proc means data=data1 noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
The extra observation you create is mean of all observations you have. You can see that in variable _type_ which shows different levels of calculations.
ieva's approach would get rid of the grand mean, but the missing is still a valid value. To delete that, do it in the data clause of the PROC MEANS:
proc means data=data1(WHERE=(id ^= .)) noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
This was a helpful answer. My data doesn't have a missing id, but I'm likely to have a dataset that does in the future.
Hi ... the default behavior of PROC MEANS (and SUMMARY) is to ignore missing values for variables in a CLASS statement (both numeric and character). So, a WHERE data set option is not needed to leave out observations with missing IDs. You have to be proactive to have missing CLASS values included ...
data x;
input id x @@;
datalines;
1 10 1 20 2 20 2 30 . 40 . 50
;
run;
proc means data=x noprint nway;
class id;
var x;
output out=y mean=x_mean;
run;
proc means data=x noprint nway missing ;
class id;
var x;
output out=z mean=x_mean;
run;
DATA SET Y: NO MISSING OPTION
id _TYPE_ _FREQ_ x_mean
1 1 2 15
2 1 2 25
DATA SET Z: MISSING OPTION ADDED
id _TYPE_ _FREQ_ x_mean
. 1 2 45
1 1 2 15
2 1 2 25
ps If the object is to produce a data set, why not just use SUMMARY where NOPRINT is the default ...
proc summary data=x nway;
class id;
var x;
output out=y mean=x_mean;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.