Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
sleep
age
|
4
4
|
6.2500000
45.0000000
|
5.1234754
31.0912635
|
1.0000000
20.0000000
|
12.0000000
90.0000000
|
Hi, I'm new to SAS programming, and I'm having some trouble figuring out how to run PROC MEANS on multiple variables while limiting a WHERE statement to only one variable.
Here's the context: My sample has 80 observations. I have two variables that I would like to run PROC MEANS on. 'Sleep' is the number of hours slept per day, and 'Age' is age in years. 'Sleep' has two outliers due to data entry errors (0 and 100), and the variable 'Sleep2' is dichotomous and excludes the two outliers (that is, if 'Sleep'<1 or 'Sleep'>12, then 'Sleep2'=.). 'Age' has no outliers that need to be excluded.
The following code runs:
proc means;
var sleep age;
where sleep2^=.;
run;
However, this WHERE statement takes out the two observations for the entire procedure (it affects PROC MEANS on 'sleep' and 'age'). I only want to restrict this to the variable 'sleep' (PROC MEANS for 'sleep' has the 78 observations without the outliers, whereas PROC MEANS for 'age' has all 80 observations). I tried out something like this:
proc means;
var sleep (where=(sleep2^=.)) age;
run;
This did not work. I know I can do this in two separate procedures (one procedure for 'sleep', another procedure for 'age'), but I would prefer to do this at once. Any advice or help?
Thanks!
Missing values aren't included in proc means calculations by default.
A where statment applies to the whole dataset. You can't have it selectively apply to some variables and not orhers.
Thanks for your input. Since a WHERE statement can't do it, is there any way to do one PROC MEANS procedure for multiple variables while restricting one variable (ex: 'sleep' if 'sleep'>0) but not for another?
Not SAS code exactly, but imagine something like this:
proc means;
var sleep (if sleep>0) age;
run;
No, you can't do what you're trying to do.
You need to run multiple proc means.
Create an intermediate view that will effectively remove outlying values :
data validData / view=validData;
set wholeData;
if sleep <1 or sleep > 12 then call missing (sleep);
run;
proc means data=validData;
var sleep age;
run;
Try this 🙂
/*Using for example instream dataset otherwise if you have SAS dataset, then SET statement*/
/* OPTION 1- Set outliers to missing in DATA step if you want the outliers to be replaced with missing values*/
/* OPTION 2- You need not set outliers to missing but subsetting WHERE statement in PROC MEANS works fine*/
DATA outliers;
INFILE DATALINES;
INPUT sleep age @@;
IF (sleep LT 1 OR sleep GT 12) THEN sleep=.;
DATALINES;
1 20 3 40 12 30
13 40 9 90
;
RUN;
/*Print data to see if outliers are replaced with missing values*/
PROC PRINT DATA=outliers NOOBS;
RUN;
PROC MEANS DATA=outliers;
VAR sleep age;
WHERE sleep GE 1 AND sleep LE 12;
RUN;
Record 13 is excluded from age calculations as well. Which isn't what the OP wants.
Yes, but age is also restricted.
With the WHERE clause the average age is 45.
Without a WHERE clause the average age is 44. The WHERE removes that entire record from the analysis, which is what the OP would like to avoid.
title 'With WHERE';
PROC MEANS DATA=outliers;
VAR sleep age;
WHERE sleep GE 1 AND sleep LE 12;
RUN;
title 'Without WHERE';
proc means data=outliers;
var sleep age;
run;
Results here:
With WHERE
Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
sleep
age
|
4
4
|
6.2500000
45.0000000
|
5.1234754
31.0912635
|
1.0000000
20.0000000
|
12.0000000
90.0000000
|
Without WHERE
Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
sleep
age
|
4
5
|
6.2500000
44.0000000
|
5.1234754
27.0185122
|
1.0000000
20.0000000
|
12.0000000
90.0000000
|
Thanks Reeza. Much clearer now. I think the author of the question didn't make state it clearly from the start.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.