BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

I am trying to create a subset w/ outliers removed for multiple variables (Outliers are defined as > 1.5 x Q3 and < Q1 / 1.5). My approach is to use multiple WHERE statements, but I am not getting the desired result. I am open to other approaches, but I'm also curious why this syntax is not working.

 

	DATA want; 
	SET have;
 		WHERE score1 BETWEEN (1.5*Q3_score1) AND (Q1_score1/1.5);
 		WHERE SAME AND score2 BETWEEN (1.5*Q3_score2) AND (Q1_score2/1.5);
 		WHERE SAME AND score3 BETWEEN (1.5*Q3_score3) AND (Q1_score3/1.5);
	RUN;
	

When I run this code w/ only the first where statement, the max value for score1 is 290. When I run it w/ the first and second where statements, the max value for score1 changes to 300. 

 

Aren't these statements independent? Why would one affect the other?

 

Thanks for your help. 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

In a SAS data set, I don't think that's one of your possible choices.  When you output an observation, all the variables are output.  You can't change that from one observation to the next.

 

There are other things you can do.  You can set out of range values to missing before you output.  Or you can totally re-shape the data set along these lines:

 

ID   Score_variable score_value

ABC  score1                  25

ABC  score2                  30

DEF  score2                  40

 

But there is no way to change the variables that get output from one observation to the next.

View solution in original post

20 REPLIES 20
Reeza
Super User

For those ones that get excluded - between 290 and 300, do they meet the other criteria for score2/score3?

 

You've used AND so the statements are not independent, all 3 conditions must be met. 

 

If if you want any of the 3 use OR. 

 

 

_maldini_
Barite | Level 11

@Reeza I want the "AND" in the BETWEEN...AND convention...What would be the correct syntax for making the individual WHERE statements independent from one another? 

PGStats
Opal | Level 21

If it's AND you want, then say AND:

 

WHERE score1 BETWEEN (1.5*Q3_score1) AND (Q1_score1/1.5) and 
SAME AND score2 BETWEEN (1.5*Q3_score2) AND (Q1_score2/1.5) AND
score3 BETWEEN (1.5*Q3_score3) AND (Q1_score3/1.5);
PG
Reeza
Super User

Forget the Where and use explicit IF. You'll save yourself a headache and future you will thank you. 

 

You wont remember the details of this the next time you encounter it and will have to recheck everything otherwise. Or at least that's what I do when I see things like that in prod code. First check to see its doing 1) what you think it's doing, 2) what the original programmer thought they were doing - which may or may not have been you. 

PGStats
Opal | Level 21

Didn't you get the following Note in the Log?

 

NOTE: WHERE clause has been replaced.

indicating that only the last WHERE clause matters?

PG
_maldini_
Barite | Level 11

@PGStats <NOTE: WHERE clause has been replaced.>

 

Yes. I saw this message, but didn't understand that it meant that "only the last WHERE clause matters".

 

Just to confirm, you're saying that all where statements prior to the last one are disregarded?

PGStats
Opal | Level 21

Do a little testing as I did and you will see that this is the case. I couldn't find it confirmed in the SAS documentation though.

PG
Astounding
PROC Star

You will need to inspect the exact wording in the note.  When  you use SAME AND in your WHERE clause, I would expect the note to say that the WHERE clause was AUGMENTED rather than REPLACED.

PGStats
Opal | Level 21

Run this:

 

data test;
set sashelp.class;
where sex="M";
where sex="F";
run;

proc print; run;
PG
_maldini_
Barite | Level 11

@PGStats Thanks. The point is made clearly w/ that code. Only the last WHERE statement is output, although in that example the WHERE statement applies to the same variable, where in mine, there variables are different. Regardless, the outcome is the same when the change to code:

data test;
set sashelp.class;
where sex="M";
where age lt 15;
run;

proc print; run;

 

Adding the WHERE-SAME-AND statement eliminates the problem, but this is not what I'm after. 

	data test;
	set sashelp.class;
	where sex="M";
	where same and age lt 15;
	run;

Is there a different way to use multiple WHERE statements when subsetting? Should I chose an entirely different approach?

 

 

_maldini_
Barite | Level 11

@Astounding Yes, "augmented".

 

<NOTE: WHERE clause has been augmented.>

 

Can you translate this note for me?

Astounding
PROC Star

Augmented:  The conditions from the first WHERE statement are still in effect, and the conditions from the second WHERE statement are being added as an additional set of conditions.

Astounding
PROC Star

Are you sure you didn't get the results mixed up?  It would make all the sense in the world to get a maximum of 300 with just one WHERE statement, but a maximum of 290 when you add a second WHERE statement.  The second WHERE statement would remove a few more observations, which could include the one that has the value of 300.

_maldini_
Barite | Level 11

@Astounding Yes, I did get them mixed up. Sorry about that.

 

I guess I'm confused about how to subset w/o narrowing the dataset to meet the conditions in all the prior WHERE statements.

 

I want a dataset that contains the values for each variable between the parameters outlined in the BETWEEN...AND statement. And I want the WHERE statements to be indepencent of each other.

 

In other words, I want all the values of score1 included if they are between 1.5*Q3 and Q1/1.5. And then separately, I want all the values of score2 included if they are between the same parameters for score2, etc.. 

 

Any suggestions?

 

Thanks for your help.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 36479 views
  • 12 likes
  • 5 in conversation