Hi,
I am using SAS 9.2 and would like to calculate the mean of variable x by the variable id. For some reason, I am creating an extra observation with the code below, so would appreciate knowing why and how to fix it.
Thanks,
Brent Fulton
UC Berkeley
data1: N=1000, unique id=300. id is a numeric and none is missing. No x is missing.
data2: N=301, where one id is missing, that is: .
My problem is that data2 should have 300 observations.
proc means data=data1 noprint;
class id;
var x;
output out=data2 mean=x_mean;
run;
You should add nway option:
proc means data=data1 noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
The extra observation you create is mean of all observations you have. You can see that in variable _type_ which shows different levels of calculations.
ieva's approach would get rid of the grand mean, but the missing is still a valid value. To delete that, do it in the data clause of the PROC MEANS:
proc means data=data1(WHERE=(id ^= .)) noprint nway ;
class id;
var x;
output out=data2 mean=x_mean;
run;
This was a helpful answer. My data doesn't have a missing id, but I'm likely to have a dataset that does in the future.
Hi ... the default behavior of PROC MEANS (and SUMMARY) is to ignore missing values for variables in a CLASS statement (both numeric and character). So, a WHERE data set option is not needed to leave out observations with missing IDs. You have to be proactive to have missing CLASS values included ...
data x;
input id x @@;
datalines;
1 10 1 20 2 20 2 30 . 40 . 50
;
run;
proc means data=x noprint nway;
class id;
var x;
output out=y mean=x_mean;
run;
proc means data=x noprint nway missing ;
class id;
var x;
output out=z mean=x_mean;
run;
DATA SET Y: NO MISSING OPTION
id _TYPE_ _FREQ_ x_mean
1 1 2 15
2 1 2 25
DATA SET Z: MISSING OPTION ADDED
id _TYPE_ _FREQ_ x_mean
. 1 2 45
1 1 2 15
2 1 2 25
ps If the object is to produce a data set, why not just use SUMMARY where NOPRINT is the default ...
proc summary data=x nway;
class id;
var x;
output out=y mean=x_mean;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.