BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
podarum
Quartz | Level 8

Hi,

I have a dataset with 1000's of accounts and most have a field of Canadian cities and some are inputted with wonky characters before and after, for example TORONTO could have 56TORONTO or 37TORONTO sddre56 or 93f MONTREA 8d , etc.. endless combination.  I want to be able to say if it contains TORONTO in the field then change the value to TORONTO, etc.  I can do this in either proc sql or data statement.  Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
kiranv_
Rhodochrosite | Level 12

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

View solution in original post

11 REPLIES 11
kiranv_
Rhodochrosite | Level 12

in proc dql

case when columnname like '%TORONTO% then TORONTO else '' end as newcolumn

 

for datastep use below example picked from SAS website

 

if find(name,'Toronto','i') ge 1 then newcolumn=';
   /* equivalent WHERE clause */
   *where upcase(name) contains 'TORONTO';

For this kind, best option is to put all your values in a reference table and use it.

Reeza
Super User

FINDW or INDEXW

Astounding
PROC Star

With just a handful of cities, it's easy to construct this sort of program;

 

data want;

set have;

if index(city, 'TORONTO') then city='TORONTO';

run;

 

With hundreds of cities, we might want to look at how to automate constructing so many IF/THEN/ELSE statements.

SASKiwi
PROC Star

Try the INDEX function  - if INDEX(address, 'TORONTO') > 0

ballardw
Super User

How many values do you have to search for?

 

A brief data step example with if

 

data want;

    set have;

   if index(upcase(city),'TORONTO')>0 then City='Toronto';

run;

 

Your example showing MONTREA  is going to have issues if that is supposed to match MONTREAL though.

If there is a clean postal code it may be easier to match on that value if you have a set with city and postal code available anywhere.

Shmuel
Garnet | Level 18

From your examples next code may give you partial solution

data temp;
  set have;
        city = compress(city_in, '0123456789');
run;

but that code is not a solution forh "93f MONTREA 8d". 

You need analyze more issues to add other treatment.

Jagadishkatam
Amethyst | Level 16

of the many ways suggested this is another 

 

data want;
    set have;
   if prxmatch('m/TORONTO/',var)>0 then City='Toronto';
run;
Thanks,
Jag
podarum
Quartz | Level 8

Thanks everyone,

 

What if I have more than 1 city that I want to be considered in TORONTO?  for example 54ETOBICOKE57 and 839TORONTO9827 then CITY = GTA, etc...

kiranv_
Rhodochrosite | Level 12

You need come up with what is your match, otherwise wildcard can take many values, sometimes you may not be aware of. 

kiranv_
Rhodochrosite | Level 12

case when columnname like '%TOronto89% or  columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition

Reeza
Super User

Then you put them all in a master list - use a master list of cities from your province - and use SOUNDEX/LIKE or COMPGED functions to calculate distance and correct the data. 

 

I believe there are address cleaning routines on Lexjansen.com for Canada specifically.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 25518 views
  • 7 likes
  • 8 in conversation