I have two tables as below.
have1 has three columns. name, date and city_name
have2 has six columns. First col has correct city names. Later columns have listed incorrect city names.
How can I replace misspelled city_name in have1 from correct name in have2
Note: somehow have2 table is not creating properly
data have1;
infile datalines dsd dlm=',' ;
input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
datalines;
Max,010,01/10/2017,Almeda
Robin,023,.,Sacramanto
Sandra,125,08/10/2013,LA
Bob,200,01/12/2016,Woodland
John,202,05/30/2014,L.A.
;
run;
data have2;
infile datalines dsd dlm=',' ;
input correct_name :$15. correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
datalines;
Sacramento,Sacraminto,Sacrament,Sacramintu,Scraminto,Sacramanto
Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
Woodland,Woodland,Woodland,-,-,-
Richmond,Rchmond,Richmand,Richmon,-,-
Alameda,Almeda,-,-,-,-
;
run;
On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.
data NameFormat;
   set have2(rename=(correct_name=Label));
   
   length Start $ 15;
   retain FmtName "$CorrectCityName";
   
   array wrong[5] incorrect_name:;
   
   do i = 1 to dim(wrong) until(wrong[i] = '-');
      Start = wrong[i];
      output;
   end;
run;
proc sort data=NameFormat nodupkey;
   by Start;
run;
proc format cntlin=NameFormat;
run;
data want;
   set have1;
   
   city_name = put(city_name, $CorrectCityName.);
run;This should give what you're looking for:
data have1;
    infile datalines dsd dlm=',' ;
    input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
    datalines;
    Max,010,01/10/2017,Almeda
    Robin,023,.,Sacramanto
    Sandra,125,08/10/2013,LA
    Bob,200,01/12/2016,Woodland
    John,202,05/30/2014,L.A.
    ;
run;
data have2;
    infile datalines dsd dlm=',' ;
    input correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
    datalines;
    Sacramento, Sacraminto, Sacrament, Sacramintu, Scraminto, Sacramanto
    Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
    Woodland,Woodland,Woodland,-,-,-
    Richmond,Rchmond,Richmand,Richmon,-,-
    Alameda,Almeda,-,-,-,-
    ;
run;
proc sql;
    create table want as
    select a.name
          ,a.id
          ,a.date format MMDDYY10.
          ,case 
               when b.correct_name is null
                   then a.city_name
               else b.correct_name
           end as city_name
    from have1 a
    left join have2 b
        on (a.city_name = b.incorrect_name1
        or a.city_name = b.incorrect_name2
        or a.city_name = b.incorrect_name3
        or a.city_name = b.incorrect_name4
        or a.city_name = b.incorrect_name5)
    ;
quit;
Wow! excellent!
Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
left join have2 b
        on (a.city_name = b.incorrect_name1
        or a.city_name = b.incorrect_name2
        or a.city_name = b.incorrect_name3
        or a.city_name = b.incorrect_name4
        or a.city_name = b.incorrect_name5)
@Barkat wrote:
Wow! excellent!
Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
left join have2 b on (a.city_name = b.incorrect_name1 or a.city_name = b.incorrect_name2 or a.city_name = b.incorrect_name3 or a.city_name = b.incorrect_name4 or a.city_name = b.incorrect_name5)
First think would be to create a data set that has one record per correct_name and incorrect_name.
That can be done with something like:
data need;
   set have2;
   array d incorrect: ;
   do i= 1 to dim(d);
      if d[i] ne '-' then do;
         inc = d[i];
         output;
      end;
   end;
   keep correct_name inc;
run;
I won't claim that will work with your data, partially because you have TWO input correct_names and mostly will have the first incorrect name in the place of the correct name.
Then you can match on the value of inc to the city name, use the NEED data set instead of HAVE2
    left join need b
        on a.city_name = b.inc
					
				
			
			
				
			
			
			
			
			
			
			
		
@Barkat wrote:Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.
Yes, here are two modified versions using loops:
1. If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50:
%macro loop_incorrect_names;
    proc sql;
        create table want as
        select a.name
              ,a.id
              ,a.date format MMDDYY10.
              ,case 
                   when b.correct_name is null
                       then a.city_name
                   else b.correct_name
               end as city_name
        from have1 a
        left join have2 b
            on (a.city_name = b.incorrect_name1
            %do i = 1 %to 50;     /* If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50 */
                or a.city_name = b.incorrect_name&i
            %end;)
        ;
    quit;
%mend;
%loop_incorrect_names2. Or if there are not always 50 INCORRECT_NAME% columns, this will go up to the last (i.e. rightmost) INCORRECT_NAME column. (Note: I am assuming HAVE2 contains only the CORRECT_NAME column, followed by all the INCORRECT_NAME columns, ordered and named per your example.)
proc contents data=have2 out=meta (keep=NAME) noprint; run;
proc sql noprint; 
    select substr(name, 15) into: max_incorrect_name_number
    from meta
    having monotonic()=max(monotonic());
quit;
%put &=max_incorrect_name_number;
%macro loop_incorrect_names;
    proc sql;
        create table want as
        select a.name
              ,a.id
              ,a.date format MMDDYY10.
              ,case 
                   when b.correct_name is null
                       then a.city_name
                   else b.correct_name
               end as city_name
        from have1 a
        left join have2 b
            on (a.city_name = b.incorrect_name1
            %do i = 1 %to &max_incorrect_name_number.;    /* Using the last INCORRECT_NAME column number */
                or a.city_name = b.incorrect_name&i
            %end;)
        ;
    quit;
%mend;
%loop_incorrect_names
On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.
data NameFormat;
   set have2(rename=(correct_name=Label));
   
   length Start $ 15;
   retain FmtName "$CorrectCityName";
   
   array wrong[5] incorrect_name:;
   
   do i = 1 to dim(wrong) until(wrong[i] = '-');
      Start = wrong[i];
      output;
   end;
run;
proc sort data=NameFormat nodupkey;
   by Start;
run;
proc format cntlin=NameFormat;
run;
data want;
   set have1;
   
   city_name = put(city_name, $CorrectCityName.);
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
