BookmarkSubscribeRSS Feed
bbpatterson
Calcite | Level 5

I recently created a data step to parse observations and conditionally based upon a series of "Checks"...

But it's not working and no syntax error appears, just only removes the 500 obs of Email, but blank addresses, city, zip etc still remain in my "Clean" set. However the "Dirty" dataset does get written to and is captured as desired.

Essentially only the last "check" is removed from the "Clean" output.

data Clean Dirty;

set SourceData;

if PRXMATCH("/\d/", address) = 0 and PRXMATCH("/\d/", address) = 0 then output Dirty;

if City = '' or PRXMATCH("/\d/", City) ne 0 then output Dirty;

if length(State) ne 2 or PRXMATCH("/\d/", State) ne 0 then output Dirty;

if country='US' and length(Zip) < 5  then output Dirty;

if country='CA' and length(Zip) > 7 then output Dirty;

if missing(Name) then output Dirty;

if Location=. then output Dirty;

if (index(Email,'@') = 0 and index(Email,'.') = 0 and Email ne '') then output Dirty; 

else output Clean;

run;

I really like this concept of outputting to multiple datasets from one Data step. But maybe this is not designed to do this??

Looks like I am going to have to create these via PROC SQL and where clauses Smiley Sad

2 REPLIES 2
Astounding
PROC Star

The problem is likely with your IF/THEN/ELSE logic.

You can be outputting the same observation multiple times to the DIRTY data set. 

ELSE only applies to the most recent IF/THEN statement.  So CLEAN will contain every observation with a valid value for EMAIL.

The general form of the solution is to string together a series of ELSE IF statements.  In abbreviated form:

if PRXMATCH ... then output Dirty;

else if City = ... then output Dirty;

...

else if (index(Email ... then output Dirty;

else output Clean;

Good luck.

CTorres
Quartz | Level 8

I would use multiple OR conditions in only one IF statement:

data Clean Dirty;

set SourceData;

if PRXMATCH("/\d/", address) = 0 and PRXMATCH("/\d/", address) = 0

OR City = '' or PRXMATCH("/\d/", City) ne 0

OR length(State) ne 2 or PRXMATCH("/\d/", State) ne 0

OR country='US' and length(Zip) < 5

OR country='CA' and length(Zip) > 7

OR missing(Name)

OR Location=.

OR (index(Email,'@') = 0 and index(Email,'.') = 0 and Email ne '') then output Dirty;

else output Clean;

run;

Regards,

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 718 views
  • 0 likes
  • 3 in conversation