Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-02-2019 01:15 AM
(1085 views)

Hello,

I want to find exact matches in two sets of book titles without regard to case-sensitive and abbreviations. Also I am new to SAS.

Title 1:

AM J CLIN ONCOL CANC

ADV DRUG DELIVER REV

ADV ECOL RES

ABDOM RADIOL

ADV ELECTRON MATER

ENDOCR RELAT CANCER

MAYO CLIN PROC

MBIO

MCN AM J MATERN CHIL

Title 2:

Abdom Radiol (NY)

ACS Appl Mater Interfaces

Adv Drug Deliv Rev

AJR Am J Roentgenol

Am J Clin Oncol

Endocr Relat Cancer

***Expected Output something like this that shows exact title matches disregarding case-sensitve or abbreviations

Exact matches

ADV DRUG DELIVER REV : Adv Drug Deliv Rev

ABDOM RADIOL : Abdom Radiol (NY)

ENDOCR RELAT CANCER : Endocr Relat Cancer

AM J CLIN ONCOL CANC: Am J Clin Oncol

How to write a code that does this?

Thanks!

- Tags:
- matching

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Its the same code as above, if you want only exact match variable , you can change sql query select statement and keep only Exact_Match variable. And am not sure why your getting only title, any error or warning in logs?

```
PROC SQL;
Create table dummy as
select catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*,
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (
Select *,
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2,
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2
else calculated len1 end as min_len
from A as x cross join B as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;
```

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This might help you start , this might not work in all cases for matching. Give it a try.

data a;

input title1 & $100.;

datalines;

AM J CLIN ONCOL CANC

ADV DRUG DELIVER REV

ADV ECOL RES

ABDOM RADIOL

ADV ELECTRON MATER

ENDOCR RELAT CANCER

MAYO CLIN PROC

MBIO

MCN AM J MATERN CHIL

;

run;

data b;

input title2 & $100.;

datalines;

Abdom Radiol (NY)

ACS Appl Mater Interfaces

Adv Drug Deliv Rev

AJR Am J Roentgenol

Am J Clin Oncol

Endocr Relat Cancer

;

run;

PROC SQL;

Create table dummy as

select title1, title2, catx(' ',title1,' : ',title2) as Exact_match

from (

select a.*,

substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,

substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,

spedis(calculated temp_title1, calculated temp_title2) as dis

from (

Select *,

length(compress(title1,' ')) as len1,

length(compress(title2,' ')) as len2,

case when calculated len1 < calculated len2 then calculated len1

when calculated len2 < calculated len1 then calculated len2

else calculated len1 end as min_len

from A1 as x cross join B1 as y

) a

where calculated temp_title1 = calculated temp_title2 or calculated dis<=15

)

;

Quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I tried running the code and got this error:

ERROR: File WORK.A1.DATA does not exist.

ERROR: File WORK.B1.DATA does not exist.

ERROR: File WORK.B1.DATA does not exist.

Did it work when you ran that code?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

try A & B , instead of A1 & B1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Though the error isn't there, but I'm not getting results. Can you share what your result look like?

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Snap shot of output.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you post your code?

I'm not getting the same output as yours when I ran it. It only shows the title 1 data.

Here's the code I used:

data a;

input title1 & $100.;

datalines;

AM J CLIN ONCOL CANC

ADV DRUG DELIVER REV

ADV ECOL RES

ABDOM RADIOL

ADV ELECTRON MATER

ENDOCR RELAT CANCER

MAYO CLIN PROC

MBIO

MCN AM J MATERN CHIL

;

run;

data b;

input title2 & $100.;

datalines;

Abdom Radiol (NY)

ACS Appl Mater Interfaces

Adv Drug Deliv Rev

AJR Am J Roentgenol

Am J Clin Oncol

Endocr Relat Cancer

;

run;

PROC SQL;

Create table dummy as

select title1, title2, catx(' ',title1,' : ',title2) as Exact_match

from (

select a.*,

substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,

substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,

spedis(calculated temp_title1, calculated temp_title2) as dis

from (

Select *,

length(compress(title1,' ')) as len1,

length(compress(title2,' ')) as len2,

case when calculated len1 < calculated len2 then calculated len1

when calculated len2 < calculated len1 then calculated len2

else calculated len1 end as min_len

from A as x cross join B as y

) a

where calculated temp_title1 = calculated temp_title2 or calculated dis<=15

)

;

Quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Its the same code as above, if you want only exact match variable , you can change sql query select statement and keep only Exact_Match variable. And am not sure why your getting only title, any error or warning in logs?

```
PROC SQL;
Create table dummy as
select catx(' ',title1,' : ',title2) as Exact_match
from (
select a.*,
substr(compress(lowcase(title1),' '),1,min_len) as temp_title1,
substr(compress(lowcase(title2),' '),1,min_len) as temp_title2,
spedis(calculated temp_title1, calculated temp_title2) as dis
from (
Select *,
length(compress(title1,' ')) as len1,
length(compress(title2,' ')) as len2,
case when calculated len1 < calculated len2 then calculated len1
when calculated len2 < calculated len1 then calculated len2
else calculated len1 end as min_len
from A as x cross join B as y
) a
where calculated temp_title1 = calculated temp_title2 or calculated dis<=15
)
;
Quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I added this statement and it worked!

proc print noobs;run;

Not sure why, but I'd be curious to know why.

Thank you again for your help!

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

Upcoming Events

- Why Data Matters – Building Efficient Foundations | 18-Jun-2024
- RTSUG Webinar: Intro to Coding in SAS Viya | 20-Jun-2024
- SAS® Viya® Workbench – Available on AWS Marketplace | 25-Jun-2024
- DCSUG Virtual Meeting with Kirk Paul Lafler | 26-Jun-2024
- Ask the Expert: How Can I Use SAS® Optimization From Python? | 09-Jul-2024
- WUSS Virtual: Mastering Oncology Studies: A Comprehensive Guide for Programmers & Biostatisticians | 12-Jul-2024
- Ask the Expert: Jupyter Notebook: Your Coding Canvas | 16-Jul-2024

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.