Hi,
I have a dataset with 1000's of accounts and most have a field of Canadian cities and some are inputted with wonky characters before and after, for example TORONTO could have 56TORONTO or 37TORONTO sddre56 or 93f MONTREA 8d , etc.. endless combination. I want to be able to say if it contains TORONTO in the field then change the value to TORONTO, etc. I can do this in either proc sql or data statement. Thanks.
case when columnname like '%TOronto89% or columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition
in proc dql
case when columnname like '%TORONTO% then TORONTO else '' end as newcolumn
for datastep use below example picked from SAS website
if find(name,'Toronto','i') ge 1 then newcolumn='; /* equivalent WHERE clause */ *where upcase(name) contains 'TORONTO';
For this kind, best option is to put all your values in a reference table and use it.
FINDW or INDEXW
With just a handful of cities, it's easy to construct this sort of program;
data want;
set have;
if index(city, 'TORONTO') then city='TORONTO';
run;
With hundreds of cities, we might want to look at how to automate constructing so many IF/THEN/ELSE statements.
Try the INDEX function - if INDEX(address, 'TORONTO') > 0
How many values do you have to search for?
A brief data step example with if
data want;
set have;
if index(upcase(city),'TORONTO')>0 then City='Toronto';
run;
Your example showing MONTREA is going to have issues if that is supposed to match MONTREAL though.
If there is a clean postal code it may be easier to match on that value if you have a set with city and postal code available anywhere.
From your examples next code may give you partial solution
data temp;
set have;
city = compress(city_in, '0123456789');
run;
but that code is not a solution forh "93f MONTREA 8d".
You need analyze more issues to add other treatment.
of the many ways suggested this is another
data want;
set have;
if prxmatch('m/TORONTO/',var)>0 then City='Toronto';
run;
Thanks everyone,
What if I have more than 1 city that I want to be considered in TORONTO? for example 54ETOBICOKE57 and 839TORONTO9827 then CITY = GTA, etc...
You need come up with what is your match, otherwise wildcard can take many values, sometimes you may not be aware of.
case when columnname like '%TOronto89% or columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition
Then you put them all in a master list - use a master list of cities from your province - and use SOUNDEX/LIKE or COMPGED functions to calculate distance and correct the data.
I believe there are address cleaning routines on Lexjansen.com for Canada specifically.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.