DATA Step, Macro, Functions and more

Conditionally output to Datasets across multiple conditions

Reply
Occasional Contributor
Posts: 6

Conditionally output to Datasets across multiple conditions

I recently created a data step to parse observations and conditionally based upon a series of "Checks"...

But it's not working and no syntax error appears, just only removes the 500 obs of Email, but blank addresses, city, zip etc still remain in my "Clean" set. However the "Dirty" dataset does get written to and is captured as desired.

Essentially only the last "check" is removed from the "Clean" output.

data Clean Dirty;

set SourceData;

if PRXMATCH("/\d/", address) = 0 and PRXMATCH("/\d/", address) = 0 then output Dirty;

if City = '' or PRXMATCH("/\d/", City) ne 0 then output Dirty;

if length(State) ne 2 or PRXMATCH("/\d/", State) ne 0 then output Dirty;

if country='US' and length(Zip) < 5  then output Dirty;

if country='CA' and length(Zip) > 7 then output Dirty;

if missing(Name) then output Dirty;

if Location=. then output Dirty;

if (index(Email,'@') = 0 and index(Email,'.') = 0 and Email ne '') then output Dirty; 

else output Clean;

run;

I really like this concept of outputting to multiple datasets from one Data step. But maybe this is not designed to do this??

Looks like I am going to have to create these via PROC SQL and where clauses Smiley Sad

Super User
Posts: 5,516

Re: Conditionally output to Datasets across multiple conditions

Posted in reply to bbpatterson

The problem is likely with your IF/THEN/ELSE logic.

You can be outputting the same observation multiple times to the DIRTY data set. 

ELSE only applies to the most recent IF/THEN statement.  So CLEAN will contain every observation with a valid value for EMAIL.

The general form of the solution is to string together a series of ELSE IF statements.  In abbreviated form:

if PRXMATCH ... then output Dirty;

else if City = ... then output Dirty;

...

else if (index(Email ... then output Dirty;

else output Clean;

Good luck.

Regular Contributor
Posts: 180

Re: Conditionally output to Datasets across multiple conditions

Posted in reply to bbpatterson

I would use multiple OR conditions in only one IF statement:

data Clean Dirty;

set SourceData;

if PRXMATCH("/\d/", address) = 0 and PRXMATCH("/\d/", address) = 0

OR City = '' or PRXMATCH("/\d/", City) ne 0

OR length(State) ne 2 or PRXMATCH("/\d/", State) ne 0

OR country='US' and length(Zip) < 5

OR country='CA' and length(Zip) > 7

OR missing(Name)

OR Location=.

OR (index(Email,'@') = 0 and index(Email,'.') = 0 and Email ne '') then output Dirty;

else output Clean;

run;

Regards,

Ask a Question
Discussion stats
  • 2 replies
  • 178 views
  • 0 likes
  • 3 in conversation