BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rwu7
Calcite | Level 5

Hello,

 

I want to find exact matches in two sets of book titles without regard to case-sensitive and abbreviations. Also I am new to SAS. 

 

Title 1:

AM J CLIN ONCOL CANC
ADV DRUG DELIVER REV
ADV ECOL RES
ABDOM RADIOL
ADV ELECTRON MATER
ENDOCR RELAT CANCER
MAYO CLIN PROC
MBIO
MCN AM J MATERN CHIL

 

Title 2:
Abdom Radiol (NY)
ACS Appl Mater Interfaces
Adv Drug Deliv Rev
AJR Am J Roentgenol
Am J Clin Oncol
Endocr Relat Cancer

 

***Expected Output something like this that shows exact title matches disregarding case-sensitve or abbreviations

 

Exact matches

 

ADV DRUG DELIVER REV : Adv Drug Deliv Rev
ABDOM RADIOL : Abdom Radiol (NY)
ENDOCR RELAT CANCER : Endocr Relat Cancer
AM J CLIN ONCOL CANC: Am J Clin Oncol

 

How to write a code that does this?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

Its the same code as above, if you want only exact match variable , you can change sql query select statement  and keep only Exact_Match variable. And am not sure why your getting only title, any error or warning in logs?

 

 
PROC SQL;
Create table dummy as
select catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*, 
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (
 
Select *, 
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2, 
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2 
else calculated len1 end as min_len 
from A as x cross join B as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;

View solution in original post

8 REPLIES 8
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7
This might help you start , this might not work in all cases for matching. Give it a try.
 
data a;
input title1 & $100.;
datalines;
AM J CLIN ONCOL CANC
ADV DRUG DELIVER REV
ADV ECOL RES
ABDOM RADIOL
ADV ELECTRON MATER
ENDOCR RELAT CANCER
MAYO CLIN PROC
MBIO
MCN AM J MATERN CHIL
;
run;
 
data b;
input title2 & $100.;
datalines;
Abdom Radiol (NY)
ACS Appl Mater Interfaces
Adv Drug Deliv Rev
AJR Am J Roentgenol
Am J Clin Oncol
Endocr Relat Cancer
;
run;
 
PROC SQL;
Create table dummy as
select title1, title2, catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*, 
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (
 
Select *, 
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2, 
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2 
else calculated len1 end as min_len 
from A1 as x cross join B1 as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;
rwu7
Calcite | Level 5

I tried running the code and got this error:

ERROR: File WORK.A1.DATA does not exist.
ERROR: File WORK.B1.DATA does not exist.
 
Did it work when you ran that code?
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7
try A & B , instead of A1 & B1
rwu7
Calcite | Level 5

Though the error isn't there, but I'm not getting results. Can you share what your result look like?

 

Thanks

RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

Snap shot of output.

 

image.png

rwu7
Calcite | Level 5

Can you post your code?

 

I'm not getting the same output as yours when I ran it. It only shows the title 1 data. 

 

Here's the code I used:

 

data a;
input title1 & $100.;
datalines;
AM J CLIN ONCOL CANC
ADV DRUG DELIVER REV
ADV ECOL RES
ABDOM RADIOL
ADV ELECTRON MATER
ENDOCR RELAT CANCER
MAYO CLIN PROC
MBIO
MCN AM J MATERN CHIL
;
run;

data b;
input title2 & $100.;
datalines;
Abdom Radiol (NY)
ACS Appl Mater Interfaces
Adv Drug Deliv Rev
AJR Am J Roentgenol
Am J Clin Oncol
Endocr Relat Cancer
;
run;

PROC SQL;
Create table dummy as
select title1, title2, catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*,
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (

Select *,
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2,
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2
else calculated len1 end as min_len
from A as x cross join B as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;

 

 

 

RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

Its the same code as above, if you want only exact match variable , you can change sql query select statement  and keep only Exact_Match variable. And am not sure why your getting only title, any error or warning in logs?

 

 
PROC SQL;
Create table dummy as
select catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*, 
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (
 
Select *, 
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2, 
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2 
else calculated len1 end as min_len 
from A as x cross join B as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;
rwu7
Calcite | Level 5

I added this statement and it worked!

 

proc print noobs;run;

 

Not sure why, but I'd be curious to know why.

 

Thank you again for your help! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1375 views
  • 0 likes
  • 2 in conversation