BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
podarum
Quartz | Level 8

Hi,

I have a dataset with 1000's of accounts and most have a field of Canadian cities and some are inputted with wonky characters before and after, for example TORONTO could have 56TORONTO or 37TORONTO sddre56 or 93f MONTREA 8d , etc.. endless combination.  I want to be able to say if it contains TORONTO in the field then change the value to TORONTO, etc.  I can do this in either proc sql or data statement.  Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
kiranv_
Rhodochrosite | Level 12

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

View solution in original post

11 REPLIES 11
kiranv_
Rhodochrosite | Level 12

in proc dql

case when columnname like '%TORONTO% then TORONTO else '' end as newcolumn

 

for datastep use below example picked from SAS website

 

if find(name,'Toronto','i') ge 1 then newcolumn=';
   /* equivalent WHERE clause */
   *where upcase(name) contains 'TORONTO';

For this kind, best option is to put all your values in a reference table and use it.

Reeza
Super User

FINDW or INDEXW

Astounding
PROC Star

With just a handful of cities, it's easy to construct this sort of program;

 

data want;

set have;

if index(city, 'TORONTO') then city='TORONTO';

run;

 

With hundreds of cities, we might want to look at how to automate constructing so many IF/THEN/ELSE statements.

SASKiwi
PROC Star

Try the INDEX function  - if INDEX(address, 'TORONTO') > 0

ballardw
Super User

How many values do you have to search for?

 

A brief data step example with if

 

data want;

    set have;

   if index(upcase(city),'TORONTO')>0 then City='Toronto';

run;

 

Your example showing MONTREA  is going to have issues if that is supposed to match MONTREAL though.

If there is a clean postal code it may be easier to match on that value if you have a set with city and postal code available anywhere.

Shmuel
Garnet | Level 18

From your examples next code may give you partial solution

data temp;
  set have;
        city = compress(city_in, '0123456789');
run;

but that code is not a solution forh "93f MONTREA 8d". 

You need analyze more issues to add other treatment.

Jagadishkatam
Amethyst | Level 16

of the many ways suggested this is another 

 

data want;
    set have;
   if prxmatch('m/TORONTO/',var)>0 then City='Toronto';
run;
Thanks,
Jag
podarum
Quartz | Level 8

Thanks everyone,

 

What if I have more than 1 city that I want to be considered in TORONTO?  for example 54ETOBICOKE57 and 839TORONTO9827 then CITY = GTA, etc...

kiranv_
Rhodochrosite | Level 12

You need come up with what is your match, otherwise wildcard can take many values, sometimes you may not be aware of. 

kiranv_
Rhodochrosite | Level 12

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

Reeza
Super User

Then you put them all in a master list - use a master list of cities from your province - and use SOUNDEX/LIKE or COMPGED functions to calculate distance and correct the data. 

 

I believe there are address cleaning routines on Lexjansen.com for Canada specifically.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 28775 views
  • 7 likes
  • 8 in conversation