DATA Step, Macro, Functions and more

Subsetting w/ multiple where statements

Accepted Solution Solved
Reply
Regular Contributor
Posts: 199
Accepted Solution

Subsetting w/ multiple where statements

I am trying to create a subset w/ outliers removed for multiple variables (Outliers are defined as > 1.5 x Q3 and < Q1 / 1.5). My approach is to use multiple WHERE statements, but I am not getting the desired result. I am open to other approaches, but I'm also curious why this syntax is not working.

 

	DATA want; 
	SET have;
 		WHERE score1 BETWEEN (1.5*Q3_score1) AND (Q1_score1/1.5);
 		WHERE SAME AND score2 BETWEEN (1.5*Q3_score2) AND (Q1_score2/1.5);
 		WHERE SAME AND score3 BETWEEN (1.5*Q3_score3) AND (Q1_score3/1.5);
	RUN;
	

When I run this code w/ only the first where statement, the max value for score1 is 290. When I run it w/ the first and second where statements, the max value for score1 changes to 300. 

 

Aren't these statements independent? Why would one affect the other?

 

Thanks for your help. 


Accepted Solutions
Solution
‎05-23-2016 05:11 PM
Super User
Posts: 5,509

Re: Subsetting w/ multiple where statements

In a SAS data set, I don't think that's one of your possible choices.  When you output an observation, all the variables are output.  You can't change that from one observation to the next.

 

There are other things you can do.  You can set out of range values to missing before you output.  Or you can totally re-shape the data set along these lines:

 

ID   Score_variable score_value

ABC  score1                  25

ABC  score2                  30

DEF  score2                  40

 

But there is no way to change the variables that get output from one observation to the next.

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Subsetting w/ multiple where statements

[ Edited ]

For those ones that get excluded - between 290 and 300, do they meet the other criteria for score2/score3?

 

You've used AND so the statements are not independent, all 3 conditions must be met. 

 

If if you want any of the 3 use OR. 

 

 

Regular Contributor
Posts: 199

Re: Subsetting w/ multiple where statements

@Reeza I want the "AND" in the BETWEEN...AND convention...What would be the correct syntax for making the individual WHERE statements independent from one another? 

Respected Advisor
Posts: 4,925

Re: Subsetting w/ multiple where statements

If it's AND you want, then say AND:

 

WHERE score1 BETWEEN (1.5*Q3_score1) AND (Q1_score1/1.5) and 
SAME AND score2 BETWEEN (1.5*Q3_score2) AND (Q1_score2/1.5) AND
score3 BETWEEN (1.5*Q3_score3) AND (Q1_score3/1.5);
PG
Super User
Posts: 19,815

Re: Subsetting w/ multiple where statements

Forget the Where and use explicit IF. You'll save yourself a headache and future you will thank you. 

 

You wont remember the details of this the next time you encounter it and will have to recheck everything otherwise. Or at least that's what I do when I see things like that in prod code. First check to see its doing 1) what you think it's doing, 2) what the original programmer thought they were doing - which may or may not have been you. 

Respected Advisor
Posts: 4,925

Re: Subsetting w/ multiple where statements

[ Edited ]

Didn't you get the following Note in the Log?

 

NOTE: WHERE clause has been replaced.

indicating that only the last WHERE clause matters?

PG
Regular Contributor
Posts: 199

Re: Subsetting w/ multiple where statements

@PGStats <NOTE: WHERE clause has been replaced.>

 

Yes. I saw this message, but didn't understand that it meant that "only the last WHERE clause matters".

 

Just to confirm, you're saying that all where statements prior to the last one are disregarded?

Respected Advisor
Posts: 4,925

Re: Subsetting w/ multiple where statements

Do a little testing as I did and you will see that this is the case. I couldn't find it confirmed in the SAS documentation though.

PG
Super User
Posts: 5,509

Re: Subsetting w/ multiple where statements

You will need to inspect the exact wording in the note.  When  you use SAME AND in your WHERE clause, I would expect the note to say that the WHERE clause was AUGMENTED rather than REPLACED.

Respected Advisor
Posts: 4,925

Re: Subsetting w/ multiple where statements

Posted in reply to Astounding

Run this:

 

data test;
set sashelp.class;
where sex="M";
where sex="F";
run;

proc print; run;
PG
Regular Contributor
Posts: 199

Re: Subsetting w/ multiple where statements

@PGStats Thanks. The point is made clearly w/ that code. Only the last WHERE statement is output, although in that example the WHERE statement applies to the same variable, where in mine, there variables are different. Regardless, the outcome is the same when the change to code:

data test;
set sashelp.class;
where sex="M";
where age lt 15;
run;

proc print; run;

 

Adding the WHERE-SAME-AND statement eliminates the problem, but this is not what I'm after. 

	data test;
	set sashelp.class;
	where sex="M";
	where same and age lt 15;
	run;

Is there a different way to use multiple WHERE statements when subsetting? Should I chose an entirely different approach?

 

 

Regular Contributor
Posts: 199

Re: Subsetting w/ multiple where statements

[ Edited ]
Posted in reply to Astounding

@Astounding Yes, "augmented".

 

<NOTE: WHERE clause has been augmented.>

 

Can you translate this note for me?

Super User
Posts: 5,509

Re: Subsetting w/ multiple where statements

Augmented:  The conditions from the first WHERE statement are still in effect, and the conditions from the second WHERE statement are being added as an additional set of conditions.

Super User
Posts: 5,509

Re: Subsetting w/ multiple where statements

Are you sure you didn't get the results mixed up?  It would make all the sense in the world to get a maximum of 300 with just one WHERE statement, but a maximum of 290 when you add a second WHERE statement.  The second WHERE statement would remove a few more observations, which could include the one that has the value of 300.

Regular Contributor
Posts: 199

Re: Subsetting w/ multiple where statements

[ Edited ]
Posted in reply to Astounding

@Astounding Yes, I did get them mixed up. Sorry about that.

 

I guess I'm confused about how to subset w/o narrowing the dataset to meet the conditions in all the prior WHERE statements.

 

I want a dataset that contains the values for each variable between the parameters outlined in the BETWEEN...AND statement. And I want the WHERE statements to be indepencent of each other.

 

In other words, I want all the values of score1 included if they are between 1.5*Q3 and Q1/1.5. And then separately, I want all the values of score2 included if they are between the same parameters for score2, etc.. 

 

Any suggestions?

 

Thanks for your help.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 3054 views
  • 11 likes
  • 5 in conversation