I want to write a MCARO to compare the 2 strings and return a number of letter difference by position.
For example, I have 2 strings.
Str1 ="ABCDEFG"; /*** Length=7 **/
Str2="ACDEFGHJ" /*** Length=8 **/
By using the MACRO:
Diff=%DiffNum (Str1, Str2); /*** return value of Diff will be 7, Since only the first letters are the same. ***/
Unfortunately, seems that none of the SAS comparing FUNCTIONs (COMPARE, COMPLEV, COMPGED) can do this.
Thanks!
This way should work.
Thank you so much. It will save me a lot time trying to write a MACRO to do the comparing.
Writing your own comparison isn't difficult:
data Want;
  length str1 str2 $200;
  Str1 = "ABCDEFG";     
  Str2 = "ACDEFGHJ"; 
  do position = 1 to max(length(Str1), length(Str2));
    if substr(Str1,position,1) ne substr(Str2,position,1)
    then Diff+1;
  end;
  put _all_;
run;Thanks SASKiwi,
In fact, I had written an almost identical program as yours to do it.
However, what I really want is to embed it into a proc SQL procedure for merging two datasets together, and allow at most 2 difference between 2 strings. That why initially I wanted to use the SAS comparing functions directly, and then found that none of them works my situation, so I consider writing a macro to do it. It would be something like the following. Would it be possible?
Data Data1; Str1="ABCDEFG"; Var1=88; Var2="yes"; output; Str1="ABCDDDD"; Var1=78; Var2="no"; output; run; Data Data2; Str2="ABCDEFH"; Var3="High"; Var4=2; output; Str2="ACDEFGHJ"; Var3="Low"; Var4=1; output; run; PROC SQL; Create table MergingData as select a.*, b.* from Data1 as a left join Data2 as b on %DiffNum(a.Str1, b.Str2)<=2; /*** Originally I had thought that I can use COMPLEV(a.Str1, b.Str2)<=2., but later found I can not use it. ***/ Quit;
Why does it need to be SQL? You can do it all in a DATA step:
Data Data1;
      Str1="ABCDEFG";  Var1=88; Var2="yes"; output;
      Str1="ABCDDDD";  Var1=78; Var2="no"; output;
run;
Data Data2;
      Str2="ABCDEFH";   Var3="High"; Var4=2; output;
      Str2="ACDEFGHJ";  Var3="Low"; Var4=1; output;
run;
data Want;
  length str1 str2 $200;
  set data1;
  set data2;
  do position = 1 to max(length(Str1), length(Str2));
    if substr(Str1,position,1) ne substr(Str2,position,1)
    then Diff+1;
  end;
  put _all_;
  if Diff <= 2;
run;Thank you so much, SASKIwi,
I didn't know it can be done that way. The code seems so easy.
1) Is there any problem using this way for merging 2 large datasets? For example, 1 dataset with 5K observations, and the other one with 3 millions records.
2) I would even like to consider comparing 3 variables and add the difference together, with the maximum allowed difference of 2. So the code will be something like the following?
data Want (drop=i j k);
  length str1 str2 var1 var2 cose1 code2 $20;
  set data1;
  set data2;
  do i= 1 to max(length(Str1), length(Str2));
    if substr(Str1,i,1) ne substr(Str2,i,1)
    then Diff1+1;
  end;
  do j = 1 to max(length(Var1), length(Var2));
    if substr(var1,j,1) ne substr(Var2,j,1)
    then Diff2+1;
  end;
  do k = 1 to max(length(code1), length(code2));
    if substr(code1,k,1) ne substr(code2,k,1)
    then Diff3+1;
  end;
  put _all_;
  if (Diff1+Diff2+Diff3) <= 2;
run;
DATA steps can deal with large datasets. You will probably need to change the merging logic as you haven't explained how that works. Are you comparing all rows in one table with all rows in the second table or is it some other joining method? The way I've done it is I assume the two input tables have the same number of rows and I'm joining row 1 with row 1, row 2 with row 2 and so on.
The code you have written seems to be on the right track.
Yes, I am comparing (and merging) all rows in one table with all rows in the second table. One table is patients list (~5k), and one table is hospital data (~4M). So I think I will need to change the merging logic, and that's why I initially try to use PROC SQL. Thanks!
Try this:
data Want;
  length str1 str2 $200;
  set data1;
  do row = 1 to obsnum;
    set data2 point = row nobs = obsnum;
	diff = 0;
    do position = 1 to max(length(Str1), length(Str2));
      if substr(Str1,position,1) ne substr(Str2,position,1)
      then Diff+1;
    end;
	if Diff <= 2 then output;
  end;
run;This way should work.
Thank you so much. It will save me a lot time trying to write a MACRO to do the comparing.
Hello @bgb,
Glad to see that SASKiwi's solution should work for you. Then it would be fair and help later readers if you marked his helpful reply as the accepted solution, not your own "thank you" post. Could you please change that? It's very easy: Select his post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
