BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

I want to write a MCARO to compare the 2 strings and return a number of letter difference by position.

For example, I have 2 strings.

Str1 ="ABCDEFG";      /*** Length=7 **/

Str2="ACDEFGHJ"     /*** Length=8 **/

By using the MACRO:

Diff=%DiffNum (Str1, Str2); /*** return value of Diff will be 7, Since only the first letters are the same. ***/

 

Unfortunately, seems that none of the SAS comparing FUNCTIONs  (COMPARE, COMPLEV, COMPGED) can do this.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

This way should work. 

Thank you so much. It will save me a lot time trying to write a MACRO to do the comparing.

View solution in original post

9 REPLIES 9
SASKiwi
PROC Star

Writing your own comparison isn't difficult:

data Want;
  length str1 str2 $200;
  Str1 = "ABCDEFG";     
  Str2 = "ACDEFGHJ"; 
  do position = 1 to max(length(Str1), length(Str2));
    if substr(Str1,position,1) ne substr(Str2,position,1)
    then Diff+1;
  end;
  put _all_;
run;
bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

Thanks  SASKiwi,

In fact, I had written an almost identical program as yours to do it.

However, what I really want is to embed it into a proc SQL procedure for merging two datasets together, and allow at most 2 difference between 2 strings. That why initially I wanted to use the SAS comparing functions directly, and then found that none of them works my situation, so I consider writing a macro to do it. It would be something like the following. Would it be possible?

 

Data Data1;
      Str1="ABCDEFG";  Var1=88; Var2="yes"; output;
      Str1="ABCDDDD";  Var1=78; Var2="no"; output;
run;

Data Data2;
      Str2="ABCDEFH";   Var3="High"; Var4=2; output;
      Str2="ACDEFGHJ";  Var3="Low"; Var4=1; output;
run;

PROC SQL;
	Create table MergingData as
	select a.*, b.*
        from Data1  as a
        left join Data2 as b on %DiffNum(a.Str1, b.Str2)<=2; 
    /*** Originally I had thought that I can use COMPLEV(a.Str1, b.Str2)<=2., but later found I can not use it. ***/
Quit;
SASKiwi
PROC Star

Why does it need to be SQL? You can do it all in a DATA step:

Data Data1;
      Str1="ABCDEFG";  Var1=88; Var2="yes"; output;
      Str1="ABCDDDD";  Var1=78; Var2="no"; output;
run;

Data Data2;
      Str2="ABCDEFH";   Var3="High"; Var4=2; output;
      Str2="ACDEFGHJ";  Var3="Low"; Var4=1; output;
run;

data Want;
  length str1 str2 $200;
  set data1;
  set data2;
  do position = 1 to max(length(Str1), length(Str2));
    if substr(Str1,position,1) ne substr(Str2,position,1)
    then Diff+1;
  end;
  put _all_;
  if Diff <= 2;
run;
bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

Thank you so much, SASKIwi,

I didn't know it can be done  that way. The code seems so easy.

1) Is there any problem using this way for merging 2 large datasets? For example, 1 dataset with 5K observations, and the other one with 3 millions records.

2) I would even like to consider comparing 3  variables and add the difference together, with the maximum allowed difference of 2. So the code will be something like the following?

data Want (drop=i j k);
  length str1 str2 var1 var2 cose1 code2 $20;
  set data1;
  set data2;
  do i= 1 to max(length(Str1), length(Str2));
    if substr(Str1,i,1) ne substr(Str2,i,1)
    then Diff1+1;
  end;
  do j = 1 to max(length(Var1), length(Var2));
    if substr(var1,j,1) ne substr(Var2,j,1)
    then Diff2+1;
  end;
  do k = 1 to max(length(code1), length(code2));
    if substr(code1,k,1) ne substr(code2,k,1)
    then Diff3+1;
  end;
  put _all_;
  if (Diff1+Diff2+Diff3) <= 2;
run;

 

SASKiwi
PROC Star

DATA steps can deal with large datasets. You will probably need to change the merging logic as you haven't explained how that works. Are you comparing all rows in one table with all rows in the second table or is it some other joining method? The way I've done it is I assume the two input tables have the same number of rows and I'm joining row 1 with row 1, row 2 with row 2 and so on.

 

The code you have written seems to be on the right track.

bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

Yes, I am comparing (and merging) all rows in one table with all rows in the second table.  One table is patients list (~5k), and one table is hospital data (~4M).   So I think I will need to change the merging logic, and that's why I initially try to use PROC SQL.  Thanks!

SASKiwi
PROC Star

Try this:

data Want;
  length str1 str2 $200;
  set data1;
  do row = 1 to obsnum;
    set data2 point = row nobs = obsnum;
	diff = 0;
    do position = 1 to max(length(Str1), length(Str2));
      if substr(Str1,position,1) ne substr(Str2,position,1)
      then Diff+1;
    end;
	if Diff <= 2 then output;
  end;
run;
bgb
Fluorite | Level 6 bgb
Fluorite | Level 6

This way should work. 

Thank you so much. It will save me a lot time trying to write a MACRO to do the comparing.

FreelanceReinh
Jade | Level 19

Hello @bgb,

 

Glad to see that SASKiwi's solution should work for you. Then it would be fair and help later readers if you marked his helpful reply as the accepted solution, not your own "thank you" post. Could you please change that? It's very easy: Select his post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.
show_option_menu.png

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 980 views
  • 2 likes
  • 3 in conversation