Hi,
I have a dataset with 1000's of accounts and most have a field of Canadian cities and some are inputted with wonky characters before and after, for example TORONTO could have 56TORONTO or 37TORONTO sddre56 or 93f MONTREA 8d , etc.. endless combination. I want to be able to say if it contains TORONTO in the field then change the value to TORONTO, etc. I can do this in either proc sql or data statement. Thanks.
case when columnname like '%TOronto89% or columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition
in proc dql
case when columnname like '%TORONTO% then TORONTO else '' end as newcolumn
for datastep use below example picked from SAS website
if find(name,'Toronto','i') ge 1 then newcolumn='; /* equivalent WHERE clause */ *where upcase(name) contains 'TORONTO';
For this kind, best option is to put all your values in a reference table and use it.
FINDW or INDEXW
With just a handful of cities, it's easy to construct this sort of program;
data want;
set have;
if index(city, 'TORONTO') then city='TORONTO';
run;
With hundreds of cities, we might want to look at how to automate constructing so many IF/THEN/ELSE statements.
Try the INDEX function - if INDEX(address, 'TORONTO') > 0
How many values do you have to search for?
A brief data step example with if
data want;
set have;
if index(upcase(city),'TORONTO')>0 then City='Toronto';
run;
Your example showing MONTREA is going to have issues if that is supposed to match MONTREAL though.
If there is a clean postal code it may be easier to match on that value if you have a set with city and postal code available anywhere.
From your examples next code may give you partial solution
data temp;
set have;
city = compress(city_in, '0123456789');
run;
but that code is not a solution forh "93f MONTREA 8d".
You need analyze more issues to add other treatment.
of the many ways suggested this is another
data want;
set have;
if prxmatch('m/TORONTO/',var)>0 then City='Toronto';
run;
Thanks everyone,
What if I have more than 1 city that I want to be considered in TORONTO? for example 54ETOBICOKE57 and 839TORONTO9827 then CITY = GTA, etc...
You need come up with what is your match, otherwise wildcard can take many values, sometimes you may not be aware of.
case when columnname like '%TOronto89% or columnname like '%TOwhatever% then TORONTO else '' end as newcolumn. just added when more or condition
Then you put them all in a master list - use a master list of cities from your province - and use SOUNDEX/LIKE or COMPGED functions to calculate distance and correct the data.
I believe there are address cleaning routines on Lexjansen.com for Canada specifically.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.