BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

I have two tables as below.

have1 has three columns. name, date and city_name

have2 has six columns. First col has correct city names. Later columns have listed incorrect city names.

How can I replace misspelled city_name in have1 from correct name in have2

 

Note: somehow have2 table is not creating properly

 

data have1;
infile datalines dsd dlm=',' ;
input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
datalines;
Max,010,01/10/2017,Almeda
Robin,023,.,Sacramanto
Sandra,125,08/10/2013,LA
Bob,200,01/12/2016,Woodland
John,202,05/30/2014,L.A.
;
run;


data have2;
infile datalines dsd dlm=',' ;
input correct_name :$15. correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
datalines;
Sacramento,Sacraminto,Sacrament,Sacramintu,Scraminto,Sacramanto
Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
Woodland,Woodland,Woodland,-,-,-
Richmond,Rchmond,Richmand,Richmon,-,-
Alameda,Almeda,-,-,-,-
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.

 

data NameFormat;
   set have2(rename=(correct_name=Label));
   
   length Start $ 15;
   retain FmtName "$CorrectCityName";
   
   array wrong[5] incorrect_name:;
   
   do i = 1 to dim(wrong) until(wrong[i] = '-');
      Start = wrong[i];
      output;
   end;
run;

proc sort data=NameFormat nodupkey;
   by Start;
run;

proc format cntlin=NameFormat;
run;

data want;
   set have1;
   
   city_name = put(city_name, $CorrectCityName.);
run;

View solution in original post

5 REPLIES 5
mklangley
Lapis Lazuli | Level 10

This should give what you're looking for:

data have1;
    infile datalines dsd dlm=',' ;
    input name :$6. id :$3. date :MMDDYY10. city_name :$12. ;
    datalines;
    Max,010,01/10/2017,Almeda
    Robin,023,.,Sacramanto
    Sandra,125,08/10/2013,LA
    Bob,200,01/12/2016,Woodland
    John,202,05/30/2014,L.A.
    ;
run;


data have2;
    infile datalines dsd dlm=',' ;
    input correct_name :$15. incorrect_name1 :$15. incorrect_name2 :$15. incorrect_name3 :$15. incorrect_name4 :$15. incorrect_name5 :$15. ;
    datalines;
    Sacramento, Sacraminto, Sacrament, Sacramintu, Scraminto, Sacramanto
    Los Angeles,Loss Angeles,LA,LAngeles,L.A.,-
    Woodland,Woodland,Woodland,-,-,-
    Richmond,Rchmond,Richmand,Richmon,-,-
    Alameda,Almeda,-,-,-,-
    ;
run;

proc sql;
    create table want as
    select a.name
          ,a.id
          ,a.date format MMDDYY10.
          ,case 
               when b.correct_name is null
                   then a.city_name
               else b.correct_name
           end as city_name
    from have1 a
    left join have2 b
        on (a.city_name = b.incorrect_name1
        or a.city_name = b.incorrect_name2
        or a.city_name = b.incorrect_name3
        or a.city_name = b.incorrect_name4
        or a.city_name = b.incorrect_name5)
    ;
quit;

 

Barkat
Pyrite | Level 9

Wow! excellent!

Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.

 

left join have2 b
        on (a.city_name = b.incorrect_name1
        or a.city_name = b.incorrect_name2
        or a.city_name = b.incorrect_name3
        or a.city_name = b.incorrect_name4
        or a.city_name = b.incorrect_name5)
ballardw
Super User

@Barkat wrote:

Wow! excellent!

Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.

 

left join have2 b
        on (a.city_name = b.incorrect_name1
        or a.city_name = b.incorrect_name2
        or a.city_name = b.incorrect_name3
        or a.city_name = b.incorrect_name4
        or a.city_name = b.incorrect_name5)

First think would be to create a data set that has one record per correct_name and incorrect_name.

That can be done with something like:

data need;
   set have2;
   array d incorrect: ;
   do i= 1 to dim(d);
      if d[i] ne '-' then do;
         inc = d[i];
         output;
      end;
   end;
   keep correct_name inc;
run;

I won't claim that will work with your data, partially because you have TWO input correct_names and mostly will have the first incorrect name in the place of the correct name.

Then you can match on the value of inc to the city name, use the NEED data set instead of HAVE2

    left join need b
        on a.city_name = b.inc
mklangley
Lapis Lazuli | Level 10

@Barkat wrote:

Is there any way to use do loop for the following. Because some of the city names have about 50 incorrect names.


 Yes, here are two modified versions using loops:

 

1. If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50:

%macro loop_incorrect_names;
    proc sql;
        create table want as
        select a.name
              ,a.id
              ,a.date format MMDDYY10.
              ,case 
                   when b.correct_name is null
                       then a.city_name
                   else b.correct_name
               end as city_name
        from have1 a
        left join have2 b
            on (a.city_name = b.incorrect_name1
            %do i = 1 %to 50;     /* If the columns are always INCORRECT_NAME1 to INCORRECT_NAME50 */
                or a.city_name = b.incorrect_name&i
            %end;)
        ;
    quit;
%mend;
%loop_incorrect_names

2. Or if there are not always 50 INCORRECT_NAME% columns, this will go up to the last (i.e. rightmost) INCORRECT_NAME column. (Note: I am assuming HAVE2 contains only the CORRECT_NAME column, followed by all the INCORRECT_NAME columns, ordered and named per your example.)

proc contents data=have2 out=meta (keep=NAME) noprint; run;

proc sql noprint; 
    select substr(name, 15) into: max_incorrect_name_number
    from meta
    having monotonic()=max(monotonic());
quit;

%put &=max_incorrect_name_number;

%macro loop_incorrect_names;
    proc sql;
        create table want as
        select a.name
              ,a.id
              ,a.date format MMDDYY10.
              ,case 
                   when b.correct_name is null
                       then a.city_name
                   else b.correct_name
               end as city_name
        from have1 a
        left join have2 b
            on (a.city_name = b.incorrect_name1
            %do i = 1 %to &max_incorrect_name_number.;    /* Using the last INCORRECT_NAME column number */
                or a.city_name = b.incorrect_name&i
            %end;)
        ;
    quit;
%mend;
%loop_incorrect_names

 

 

andreas_lds
Jade | Level 19

On executing the step creating have2 i get some interesting messages in the log indicating that there should be only one variable named correct_name. So after fixing the input-statement, it seems that the problem could be solved with a format.

 

data NameFormat;
   set have2(rename=(correct_name=Label));
   
   length Start $ 15;
   retain FmtName "$CorrectCityName";
   
   array wrong[5] incorrect_name:;
   
   do i = 1 to dim(wrong) until(wrong[i] = '-');
      Start = wrong[i];
      output;
   end;
run;

proc sort data=NameFormat nodupkey;
   by Start;
run;

proc format cntlin=NameFormat;
run;

data want;
   set have1;
   
   city_name = put(city_name, $CorrectCityName.);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 854 views
  • 5 likes
  • 4 in conversation