BookmarkSubscribeRSS Feed
chelm24
Calcite | Level 5

Hello,

 

I require assistance with comparing two variables in SAS and determining if there are any partial record matches.

data have;
length VAR1 $100 VAR2 $100;
input VAR1 $ VAR2 $;
infile datalines dlm='|';
datalines;
1.DR. MORRISON|
1.MORRISON| ABCFG MORRISON
1.DR. MORRISON| MORRISON
1.DR. MORRISON| DR. WRIGHT
1. LA HOSPITAL| SAN DIEGO
;

data want / Expected Result

VAR1VAR2PARTIAL MATCH?
1.DR. MORRISON NO
1.MORRISONABCFG MORRISONYES
1.DR. MORRISONMORRISONYES
1.DR. MORRISONDR.WRIGHTNO
1. LA HOSPITALSAN DIEGONO
5 REPLIES 5
ballardw
Super User

Define your criteria for "partial match". 4 letters the same in sequence? 5 ? 6? Some other rule?

 

There are several SAS functions, COMPGED, SPEDIS and COMPLEV that will provide scores of spelling "distance", or a measure of similarity. I would try all three, and read the documentation, to select which seems to fit your data and need best. The lower the score returned the more similar two variables are.

 

data have;
length VAR1 $100 VAR2 $100;
input VAR1 $ VAR2 $;
infile datalines dlm='|';
Compgedscore = compged(var1, var2);
Complevscore = complev(var1, var2);
Spedisscore  = spedis(var1, var2);
datalines;
1.DR. MORRISON|
1.MORRISON| ABCFG MORRISON
1.DR. MORRISON| MORRISON
1.DR. MORRISON| DR. WRIGHT
1. LA HOSPITAL| SAN DIEGO
;
chelm24
Calcite | Level 5

@ballardw , I need to determine the words that match between 2 variables and not by score. Partial match at least >= 4 letters the same in sequence.

 

VAR1VAR2PARTIAL MATCH?MATCH
1.DR. MORRISON NO 
1.MORRISONABCFG MORRISONYESMORRISON
1.DR. MORRISONMORRISONYESMORRISON
1.DR. MORRISONDR.WRIGHTNO 
1. LA HOSPITALSAN DIEGONO 

 

 

Patrick
Opal | Level 21

@chelm24 wrote:

@ballardw , I need to determine the words that match between 2 variables and not by score. Partial match at least >= 4 letters the same in sequence.


Exact match of WORD OR exact match of any string of 4 characters within the same WORD. 

Above two options should be doable BUT if you go for the option with 4 characters it could then be any two words as long as they share a sequence of 4 identical characters. 

Tom
Super User Tom
Super User

Just test each word.

data want ;
  set have;
  do i=1 to countw(var1,' ,.()-') until(found);
    word=scan(var1,i,' ,.()-');
    if length(word)>3 then found = 0<findw(var2,word,' ,.()-','it');
  end;
  if not found then do; 
     word=' ';
     i=0;
  end;
run;
Obs         VAR1         VAR2              i    found      word

 1     1.DR. MORRISON                      0      0
 2     1.MORRISON        ABCFG MORRISON    2      1      MORRISON
 3     1.DR. MORRISON    MORRISON          3      1      MORRISON
 4     1.DR. MORRISON    DR. WRIGHT        0      0
 5     1. LA HOSPITAL    SAN DIEGO         0      0
Ksharp
Super User
data have;
length VAR1 $100 VAR2 $100;
input VAR1 $ VAR2 $;
infile datalines dlm='|';

if find(compress(var1,,'ka'),compress(var2,,'ka'),'i') or 
   find(compress(var2,,'ka'),compress(var1,,'ka'),'i') then MATCH='Yes' ;
  else MATCH='No ' ;

datalines;
1.DR. MORRISON|
1.MORRISON| ABCFG MORRISON
1.DR. MORRISON| MORRISON
1.DR. MORRISON| DR. WRIGHT
1. LA HOSPITAL| SAN DIEGO
;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 401 views
  • 0 likes
  • 5 in conversation