BookmarkSubscribeRSS Feed
archimboldi
Calcite | Level 5

Hi, I'm new to SAS programming, and I'm having some trouble figuring out how to run PROC MEANS on multiple variables while limiting a WHERE statement to only one variable. 

 

Here's the context: My sample has 80 observations. I have two variables that I would like to run PROC MEANS on. 'Sleep' is the number of hours slept per day, and 'Age' is age in years. 'Sleep' has two outliers due to data entry errors (0 and 100), and the variable 'Sleep2' is dichotomous and excludes the two outliers (that is, if 'Sleep'<1 or 'Sleep'>12, then 'Sleep2'=.). 'Age' has no outliers that need to be excluded. 

 

The following code runs: 

 

proc means;

var sleep age;

where sleep2^=.;

run;

 

However, this WHERE statement takes out the two observations for the entire procedure (it affects PROC MEANS on 'sleep' and 'age'). I only want to restrict this to the variable 'sleep' (PROC MEANS for 'sleep' has the 78 observations without the outliers, whereas PROC MEANS for 'age' has all 80 observations). I tried out something like this:

 

proc means;

var sleep (where=(sleep2^=.)) age;

run;

 

This did not work. I know I can do this in two separate procedures (one procedure for 'sleep', another procedure for 'age'), but I would prefer to do this at once. Any advice or help?

 

Thanks!

 

9 REPLIES 9
Reeza
Super User

Missing values aren't included in proc means calculations by default. 

 

A where statment applies to the whole dataset. You can't have it selectively apply to some variables and not orhers. 

archimboldi
Calcite | Level 5

Thanks for your input. Since a WHERE statement can't do it, is there any way to do one PROC MEANS procedure for multiple variables while restricting one variable (ex: 'sleep' if 'sleep'>0) but not for another?

 

Not SAS code exactly, but imagine something like this:

 

proc means;

var sleep (if sleep>0) age;

run;

 

 

Reeza
Super User

No, you can't do what you're trying to do. 

You need to run multiple proc means. 

PGStats
Opal | Level 21

Create an intermediate view that will effectively remove outlying values :

 

data validData / view=validData;
set wholeData;
if sleep <1 or sleep > 12 then call missing (sleep);
run;

proc means data=validData;
var sleep age;
run;
PG
kenedy_yc
Fluorite | Level 6

Try this 🙂

 

 

/*Using for example instream dataset otherwise if you have SAS dataset, then SET statement*/
/* OPTION 1- Set outliers to missing in DATA step if you want the outliers to be replaced with missing values*/
/* OPTION 2- You need not set outliers to missing but subsetting WHERE statement in PROC MEANS works fine*/
DATA outliers;
INFILE DATALINES;
INPUT sleep age @@;
IF (sleep LT 1 OR sleep GT 12) THEN sleep=.;
DATALINES;
1 20 3 40 12 30
13 40 9 90
;
RUN;

/*Print data to see if outliers are replaced with missing values*/
PROC PRINT DATA=outliers NOOBS;
RUN;


PROC MEANS DATA=outliers;
VAR sleep age;
WHERE sleep GE 1 AND sleep LE 12;
RUN;

Reeza
Super User

Record 13 is excluded from age calculations as well. Which isn't what the OP wants. 

kenedy_yc
Fluorite | Level 6
From subsetting condition "... that is, if 'Sleep'<1 or 'Sleep'>12, then 'Sleep2'=. " seems sleep takes on positive values in the range 1 to 12 ?
Reeza
Super User

Yes, but age is also restricted. 

 

With the WHERE clause the average age is 45.

Without a WHERE clause the average age is 44. The WHERE removes that entire record from the analysis, which is what the OP would like to avoid.

 

 


title 'With WHERE';
PROC MEANS DATA=outliers; 
VAR sleep age; 
WHERE sleep GE 1 AND sleep LE 12;
RUN;


title 'Without WHERE';
proc means data=outliers;
var sleep age;
run;

 

Results here:

 

With WHERE

Variable N Mean Std Dev Minimum Maximum
sleep
age
4
4
6.2500000
45.0000000
5.1234754
31.0912635
1.0000000
20.0000000
12.0000000
90.0000000

 

Without WHERE

Variable N Mean Std Dev Minimum Maximum
sleep
age
4
5
6.2500000
44.0000000
5.1234754
27.0185122
1.0000000
20.0000000
12.0000000
90.0000000

 

 

 

kenedy_yc
Fluorite | Level 6

Thanks Reeza. Much clearer now. I think the author of the question didn't make state it clearly from the start.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 6493 views
  • 2 likes
  • 4 in conversation