DATA Step, Macro, Functions and more

wildcard % in if statement

Accepted Solution Solved
Reply
Super Contributor
Posts: 395
Accepted Solution

wildcard % in if statement

Hi,

I have a dataset with 1000's of accounts and most have a field of Canadian cities and some are inputted with wonky characters before and after, for example TORONTO could have 56TORONTO or 37TORONTO sddre56 or 93f MONTREA 8d , etc.. endless combination.  I want to be able to say if it contains TORONTO in the field then change the value to TORONTO, etc.  I can do this in either proc sql or data statement.  Thanks.


Accepted Solutions
Solution
‎06-13-2017 09:59 PM
PROC Star
Posts: 253

Re: wildcard % in if statement

[ Edited ]

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

View solution in original post


All Replies
PROC Star
Posts: 253

Re: wildcard % in if statement

in proc dql

case when columnname like '%TORONTO% then TORONTO else '' end as newcolumn

 

for datastep use below example picked from SAS website

 

if find(name,'Toronto','i') ge 1 then newcolumn=';
   /* equivalent WHERE clause */
   *where upcase(name) contains 'TORONTO';

For this kind, best option is to put all your values in a reference table and use it.

Super User
Posts: 17,837

Re: wildcard % in if statement

FINDW or INDEXW

Super User
Posts: 5,083

Re: wildcard % in if statement

With just a handful of cities, it's easy to construct this sort of program;

 

data want;

set have;

if index(city, 'TORONTO') then city='TORONTO';

run;

 

With hundreds of cities, we might want to look at how to automate constructing so many IF/THEN/ELSE statements.

Super User
Posts: 3,108

Re: wildcard % in if statement

Try the INDEX function  - if INDEX(address, 'TORONTO') > 0

Super User
Posts: 10,500

Re: wildcard % in if statement

How many values do you have to search for?

 

A brief data step example with if

 

data want;

    set have;

   if index(upcase(city),'TORONTO')>0 then City='Toronto';

run;

 

Your example showing MONTREA  is going to have issues if that is supposed to match MONTREAL though.

If there is a clean postal code it may be easier to match on that value if you have a set with city and postal code available anywhere.

Trusted Advisor
Posts: 1,381

Re: wildcard % in if statement

From your examples next code may give you partial solution

data temp;
  set have;
        city = compress(city_in, '0123456789');
run;

but that code is not a solution forh "93f MONTREA 8d". 

You need analyze more issues to add other treatment.

Trusted Advisor
Posts: 1,130

Re: wildcard % in if statement

of the many ways suggested this is another 

 

data want;
    set have;
   if prxmatch('m/TORONTO/',var)>0 then City='Toronto';
run;
Thanks,
Jag
Super Contributor
Posts: 395

Re: wildcard % in if statement

[ Edited ]

Thanks everyone,

 

What if I have more than 1 city that I want to be considered in TORONTO?  for example 54ETOBICOKE57 and 839TORONTO9827 then CITY = GTA, etc...

PROC Star
Posts: 253

Re: wildcard % in if statement

You need come up with what is your match, otherwise wildcard can take many values, sometimes you may not be aware of. 

Solution
‎06-13-2017 09:59 PM
PROC Star
Posts: 253

Re: wildcard % in if statement

[ Edited ]

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

Super User
Posts: 17,837

Re: wildcard % in if statement

Then you put them all in a master list - use a master list of cities from your province - and use SOUNDEX/LIKE or COMPGED functions to calculate distance and correct the data. 

 

I believe there are address cleaning routines on Lexjansen.com for Canada specifically.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 248 views
  • 7 likes
  • 8 in conversation