I have two tables as below.
have1 has three columns. name, date and city_name
have2 has six columns. First col has correct city names. Later columns have listed incorrect city names.
How can I replace misspelled city_name in have1 from correct name in have2
Note: somehow have2 table is not creating properly
data have1;
infile datalines dsd dlm=',' ;
input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
datalines;
Max,010,01/10/2017,Almeda
Robin,023,.,Sacramanto
Sandra,125,08/10/2013,LA
Bob,200,01/12/2016,Woodland
John,202,05/30/2014,L.A.
;
run;
data have2;
infile datalines dsd dlm=',' ;
input correct_name :$15. correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
datalines;
Sacramento,Sacraminto,Sacrament,Sacramintu,Scraminto,Sacramanto
Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
Woodland,Woodland,Woodland,-,-,-
Richmond,Rchmond,Richmand,Richmon,-,-
Alameda,Almeda,-,-,-,-
;
run;
On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.
data NameFormat;
set have2(rename=(correct_name=Label));
length Start $ 15;
retain FmtName "$CorrectCityName";
array wrong[5] incorrect_name:;
do i = 1 to dim(wrong) until(wrong[i] = '-');
Start = wrong[i];
output;
end;
run;
proc sort data=NameFormat nodupkey;
by Start;
run;
proc format cntlin=NameFormat;
run;
data want;
set have1;
city_name = put(city_name, $CorrectCityName.);
run;
This should give what you're looking for:
data have1;
infile datalines dsd dlm=',' ;
input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
datalines;
Max,010,01/10/2017,Almeda
Robin,023,.,Sacramanto
Sandra,125,08/10/2013,LA
Bob,200,01/12/2016,Woodland
John,202,05/30/2014,L.A.
;
run;
data have2;
infile datalines dsd dlm=',' ;
input correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
datalines;
Sacramento, Sacraminto, Sacrament, Sacramintu, Scraminto, Sacramanto
Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
Woodland,Woodland,Woodland,-,-,-
Richmond,Rchmond,Richmand,Richmon,-,-
Alameda,Almeda,-,-,-,-
;
run;
proc sql;
create table want as
select a.name
,a.id
,a.date format MMDDYY10.
,case
when b.correct_name is null
then a.city_name
else b.correct_name
end as city_name
from have1 a
left join have2 b
on (a.city_name = b.incorrect_name1
or a.city_name = b.incorrect_name2
or a.city_name = b.incorrect_name3
or a.city_name = b.incorrect_name4
or a.city_name = b.incorrect_name5)
;
quit;
Wow! excellent!
Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
left join have2 b
on (a.city_name = b.incorrect_name1
or a.city_name = b.incorrect_name2
or a.city_name = b.incorrect_name3
or a.city_name = b.incorrect_name4
or a.city_name = b.incorrect_name5)
@Barkat wrote:
Wow! excellent!
Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
left join have2 b on (a.city_name = b.incorrect_name1 or a.city_name = b.incorrect_name2 or a.city_name = b.incorrect_name3 or a.city_name = b.incorrect_name4 or a.city_name = b.incorrect_name5)
First think would be to create a data set that has one record per correct_name and incorrect_name.
That can be done with something like:
data need; set have2; array d incorrect: ; do i= 1 to dim(d); if d[i] ne '-' then do; inc = d[i]; output; end; end; keep correct_name inc; run;
I won't claim that will work with your data, partially because you have TWO input correct_names and mostly will have the first incorrect name in the place of the correct name.
Then you can match on the value of inc to the city name, use the NEED data set instead of HAVE2
left join need b on a.city_name = b.inc
@Barkat wrote:Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
Yes, here are two modified versions using loops:
1. If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50:
%macro loop_incorrect_names;
proc sql;
create table want as
select a.name
,a.id
,a.date format MMDDYY10.
,case
when b.correct_name is null
then a.city_name
else b.correct_name
end as city_name
from have1 a
left join have2 b
on (a.city_name = b.incorrect_name1
%do i = 1 %to 50; /* If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50 */
or a.city_name = b.incorrect_name&i
%end;)
;
quit;
%mend;
%loop_incorrect_names
2. Or if there are not always 50 INCORRECT_NAME% columns, this will go up to the last (i.e. rightmost) INCORRECT_NAME column. (Note: I am assuming HAVE2 contains only the CORRECT_NAME column, followed by all the INCORRECT_NAME columns, ordered and named per your example.)
proc contents data=have2 out=meta (keep=NAME) noprint; run;
proc sql noprint;
select substr(name, 15) into: max_incorrect_name_number
from meta
having monotonic()=max(monotonic());
quit;
%put &=max_incorrect_name_number;
%macro loop_incorrect_names;
proc sql;
create table want as
select a.name
,a.id
,a.date format MMDDYY10.
,case
when b.correct_name is null
then a.city_name
else b.correct_name
end as city_name
from have1 a
left join have2 b
on (a.city_name = b.incorrect_name1
%do i = 1 %to &max_incorrect_name_number.; /* Using the last INCORRECT_NAME column number */
or a.city_name = b.incorrect_name&i
%end;)
;
quit;
%mend;
%loop_incorrect_names
On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.
data NameFormat;
set have2(rename=(correct_name=Label));
length Start $ 15;
retain FmtName "$CorrectCityName";
array wrong[5] incorrect_name:;
do i = 1 to dim(wrong) until(wrong[i] = '-');
Start = wrong[i];
output;
end;
run;
proc sort data=NameFormat nodupkey;
by Start;
run;
proc format cntlin=NameFormat;
run;
data want;
set have1;
city_name = put(city_name, $CorrectCityName.);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.