BookmarkSubscribeRSS Feed
DeepakSwain
Pyrite | Level 9

Hi there,

I am trying to identify records having a specific word as part of the record. There may be spelling error with alteration in 1-2 characters. A sample dataset is attached for your kind cosnideration. I am trying to flag records having the word "DUODENUM" as well as "DUODENAL" and "DUODENOL" without hard coding. 

I am not sure whether spedic function or soundex function will be of help. To begin with, I used spedis function but I realized that it taking into consideration of the whole statement and the word itself. So I don't know how to apply the matching crieria for alteration in 1-2 characters in the word itself. Sas code used:

data test1;
set test;
if find(report, 'DUODENUM')  then value1=spedis(report, 'DUODENUM');
if value1 ne .;
run;

proc sort data =test1(obs=1) out =test2 ; by value1 ;run;


proc sql;
create table test3 as 
select id,report,  (select value1 from test2) as value1, spedis(report, 'DUODENUM') as value2
from test 
where (calculated value2 -  calculated value1) le 2 ;
quit;

 later I tried the follwoing code as adviced by Ksharp:

proc sql;
create table test3 as 
select value1, value2
from test as a, test as b
where value1 =* value2 ;
quit;

I got following message in the log: ERROR: The following columns were not found in the contributing tables: value1, value2. Being novice in SAS, I am not understanding the logic provided by Ksharp. Hence I did not altered the code to generate value 1 and value 2.  

Can somebody help me further with this. 

Thank you in advance for your kind support.

Regards,

Deepak

Swain
4 REPLIES 4
Reeza
Super User

You need to replace value1 and value2 variable references with the names of your actual variables.

 

It's not a good idea to run code you don't understand, so I'd start by trying to understand the code.

 

Review the code and highlight the sections you don't understand and someone can help clarify. One good way to do this is to comment the code. Add comments for every proc/data step and even sections of them to highlight what they're doing. If you ever need to use this code again you'll probably be happy you did this. 

 

Quick comments regarding @Ksharp's code, it's what's called a self join - you're joining each record in your dataset to every other record on the condition in the where clause (WHERE). 

=* is the SOUNDS LIKE operator, or SOUNDEX

http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p0eaz2e63dlj17n1i5z1...

DeepakSwain
Pyrite | Level 9

Hi Reeza,

 

Thanks for your advise as well as for explaining Ksharp's code. 

 

Appreciated.

Regards,

Deepak

Swain
Ksharp
Super User

proc sql;
create table test3 as 
select a.value1, b.value2
from test as a, test as b
where a.value1 =*  b.value2 ;
quit;
DeepakSwain
Pyrite | Level 9

Hi Ksharp,

 

thanks for your incredible guidance. With your as well as Reeza's guidance, I have customized the sas code as:

 

proc sql;
create table test3 as 
select a.report, soundex(a.report) as value1, soundex(b.report) as value2
from test as a, test as b
where calculated value1 =*   calculated value2 ;
quit;

 

I am getting this:

Report                                                                                            value1                                                           value2
STRICTURE IN THE GASTRODUODENAL JUNCTION                 S36236532236335425235              S36236532236335425235
POLYP WITHOUT DYSPLASIA AT FIRST PART OF DUODENUM P4133321423162316313355          P4133321423162316313355
MULTIPLE ULCERS PRESENT IN THE DUODENOM                    M4314426216253533355                M4314426216253533355
ULCER OVER DUODENOL MUCOSA                                            U426163354522                              U426163354522
NOTHING ABONRMALITY FOUND IN THE DUODENUM              N352156543153533355                  N352156543153533355

 

 

Can you guide me further from this step. My expectation is to get a list of terms related to DUODENUM

  • DUODENUM 
  • DUODENOM                    
  • DUODENOL

Thank you in advance for your kind reply. 

 

Regards,

Deepak

 

 

 

 

 

Swain

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1148 views
  • 4 likes
  • 3 in conversation