## Subsetting data using "If" statements

Solved
Occasional Contributor
Posts: 13

# Subsetting data using "If" statements

If I am trying to subset my data to include all femles that have variable A <5 OR variable B >=10 OR variable C = "blue," can I accomplish this correctly using two separate "if" statements? That is:

data mydata;

set path01.mydata;

if gender = "F";

drop variable D;

if variable A <5 or variable B >=10 or variable C = "blue";

run;

When I run a print procedure on this program, it correctly only prints females, but it doesn't seem to be keeping all data points that meet any one of the three variable criteria, but seems to keep most of them and doesn't produce any error?

Accepted Solutions
Solution
‎11-27-2016 05:08 PM
Super User
Posts: 6,774

## Re: Subsetting data using "If" statements

The most likely culprit is C.  Character values are case sensitive, so all of these values would be different:

blue

Blue

BLUE

You could always change the third check to be:

or upcase(C) = 'BLUE'

All Replies
Super User
Posts: 23,724

## Re: Subsetting data using "If" statements

Your code looks correct. Post a sample of records that you think should be included but aren't or vice versa.

Solution
‎11-27-2016 05:08 PM
Super User
Posts: 6,774

## Re: Subsetting data using "If" statements

The most likely culprit is C.  Character values are case sensitive, so all of these values would be different:

blue

Blue

BLUE

You could always change the third check to be:

or upcase(C) = 'BLUE'

Super User
Posts: 5,878

## Re: Subsetting data using "If" statements

Side note: since you are not relying on calculated variables you should use WHERE instead, since it's more efficient.
Data never sleeps
Super User
Posts: 9,599

## Re: Subsetting data using "If" statements

At a glance:

```data mydata;
set path01.mydata (where=(upcase(gender="F") and upcase(c)="BLUE" and (a < 5 or b > 10)));
run;```
Super User
Posts: 13,542

## Re: Subsetting data using "If" statements

Question about the "A < 5" requirement: Do you also want missing values for A? Missing is "less than any value" as treated by SAS. If you do not want missing values for A then you will need to provide either  something like: (not missing(A) And a<5) or provide a lower bound of acceptable values such as   0 le 5 lt 5

☑ This topic is solved.