Help using Base SAS procedures

PROC MEANS: Restricting 'WHERE' statement to one variable

Reply
New Contributor
Posts: 2

PROC MEANS: Restricting 'WHERE' statement to one variable

Hi, I'm new to SAS programming, and I'm having some trouble figuring out how to run PROC MEANS on multiple variables while limiting a WHERE statement to only one variable. 

 

Here's the context: My sample has 80 observations. I have two variables that I would like to run PROC MEANS on. 'Sleep' is the number of hours slept per day, and 'Age' is age in years. 'Sleep' has two outliers due to data entry errors (0 and 100), and the variable 'Sleep2' is dichotomous and excludes the two outliers (that is, if 'Sleep'<1 or 'Sleep'>12, then 'Sleep2'=.). 'Age' has no outliers that need to be excluded. 

 

The following code runs: 

 

proc means;

var sleep age;

where sleep2^=.;

run;

 

However, this WHERE statement takes out the two observations for the entire procedure (it affects PROC MEANS on 'sleep' and 'age'). I only want to restrict this to the variable 'sleep' (PROC MEANS for 'sleep' has the 78 observations without the outliers, whereas PROC MEANS for 'age' has all 80 observations). I tried out something like this:

 

proc means;

var sleep (where=(sleep2^=.)) age;

run;

 

This did not work. I know I can do this in two separate procedures (one procedure for 'sleep', another procedure for 'age'), but I would prefer to do this at once. Any advice or help?

 

Thanks!

 

Super User
Posts: 17,948

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Missing values aren't included in proc means calculations by default. 

 

A where statment applies to the whole dataset. You can't have it selectively apply to some variables and not orhers. 

New Contributor
Posts: 2

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

[ Edited ]

Thanks for your input. Since a WHERE statement can't do it, is there any way to do one PROC MEANS procedure for multiple variables while restricting one variable (ex: 'sleep' if 'sleep'>0) but not for another?

 

Not SAS code exactly, but imagine something like this:

 

proc means;

var sleep (if sleep>0) age;

run;

 

 

Super User
Posts: 17,948

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

[ Edited ]

No, you can't do what you're trying to do. 

You need to run multiple proc means. 

Respected Advisor
Posts: 4,660

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Create an intermediate view that will effectively remove outlying values :

 

data validData / view=validData;
set wholeData;
if sleep <1 or sleep > 12 then call missing (sleep);
run;

proc means data=validData;
var sleep age;
run;
PG
Occasional Contributor
Posts: 5

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Try this Smiley Happy

 

 

/*Using for example instream dataset otherwise if you have SAS dataset, then SET statement*/
/* OPTION 1- Set outliers to missing in DATA step if you want the outliers to be replaced with missing values*/
/* OPTION 2- You need not set outliers to missing but subsetting WHERE statement in PROC MEANS works fine*/
DATA outliers;
INFILE DATALINES;
INPUT sleep age @@;
IF (sleep LT 1 OR sleep GT 12) THEN sleep=.;
DATALINES;
1 20 3 40 12 30
13 40 9 90
;
RUN;

/*Print data to see if outliers are replaced with missing values*/
PROC PRINT DATA=outliers NOOBS;
RUN;


PROC MEANS DATA=outliers;
VAR sleep age;
WHERE sleep GE 1 AND sleep LE 12;
RUN;

Super User
Posts: 17,948

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Record 13 is excluded from age calculations as well. Which isn't what the OP wants. 

Occasional Contributor
Posts: 5

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

From subsetting condition "... that is, if 'Sleep'<1 or 'Sleep'>12, then 'Sleep2'=. " seems sleep takes on positive values in the range 1 to 12 ?
Super User
Posts: 17,948

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Yes, but age is also restricted. 

 

With the WHERE clause the average age is 45.

Without a WHERE clause the average age is 44. The WHERE removes that entire record from the analysis, which is what the OP would like to avoid.

 

 


title 'With WHERE';
PROC MEANS DATA=outliers; 
VAR sleep age; 
WHERE sleep GE 1 AND sleep LE 12;
RUN;


title 'Without WHERE';
proc means data=outliers;
var sleep age;
run;

 

Results here:

 

With WHERE

Variable N Mean Std Dev Minimum Maximum
sleep
age
4
4
6.2500000
45.0000000
5.1234754
31.0912635
1.0000000
20.0000000
12.0000000
90.0000000

 

Without WHERE

Variable N Mean Std Dev Minimum Maximum
sleep
age
4
5
6.2500000
44.0000000
5.1234754
27.0185122
1.0000000
20.0000000
12.0000000
90.0000000

 

 

 

Occasional Contributor
Posts: 5

Re: PROC MEANS: Restricting 'WHERE' statement to one variable

Thanks Reeza. Much clearer now. I think the author of the question didn't make state it clearly from the start.

Ask a Question
Discussion stats
  • 9 replies
  • 427 views
  • 2 likes
  • 4 in conversation