DATA Step, Macro, Functions and more

Flag records having a specific word even with alteration in spelling by one character

Reply
Frequent Contributor
Posts: 96

Flag records having a specific word even with alteration in spelling by one character

Hi there,

I am trying to identify records having a specific word as part of the record. There may be spelling error with alteration in 1-2 characters. A sample dataset is attached for your kind cosnideration. I am trying to flag records having the word "DUODENUM" as well as "DUODENAL" and "DUODENOL" without hard coding. 

I am not sure whether spedic function or soundex function will be of help. To begin with, I used spedis function but I realized that it taking into consideration of the whole statement and the word itself. So I don't know how to apply the matching crieria for alteration in 1-2 characters in the word itself. Sas code used:

data test1;
set test;
if find(report, 'DUODENUM')  then value1=spedis(report, 'DUODENUM');
if value1 ne .;
run;

proc sort data =test1(obs=1) out =test2 ; by value1 ;run;


proc sql;
create table test3 as 
select id,report,  (select value1 from test2) as value1, spedis(report, 'DUODENUM') as value2
from test 
where (calculated value2 -  calculated value1) le 2 ;
quit;

 later I tried the follwoing code as adviced by Ksharp:

proc sql;
create table test3 as 
select value1, value2
from test as a, test as b
where value1 =* value2 ;
quit;

I got following message in the log: ERROR: The following columns were not found in the contributing tables: value1, value2. Being novice in SAS, I am not understanding the logic provided by Ksharp. Hence I did not altered the code to generate value 1 and value 2.  

Can somebody help me further with this. 

Thank you in advance for your kind support.

Regards,

Deepak

Swain
Super User
Posts: 17,865

Re: Flag records having a specific word even with alteration in spelling by one character

You need to replace value1 and value2 variable references with the names of your actual variables.

 

It's not a good idea to run code you don't understand, so I'd start by trying to understand the code.

 

Review the code and highlight the sections you don't understand and someone can help clarify. One good way to do this is to comment the code. Add comments for every proc/data step and even sections of them to highlight what they're doing. If you ever need to use this code again you'll probably be happy you did this. 

 

Quick comments regarding @Ksharp's code, it's what's called a self join - you're joining each record in your dataset to every other record on the condition in the where clause (WHERE). 

=* is the SOUNDS LIKE operator, or SOUNDEX

http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p0eaz2e63dlj17n1i5z1...

Frequent Contributor
Posts: 96

Re: Flag records having a specific word even with alteration in spelling by one character

Hi Reeza,

 

Thanks for your advise as well as for explaining Ksharp's code. 

 

Appreciated.

Regards,

Deepak

Swain
Super User
Posts: 9,682

Re: Flag records having a specific word even with alteration in spelling by one character


proc sql;
create table test3 as 
select a.value1, b.value2
from test as a, test as b
where a.value1 =*  b.value2 ;
quit;
Frequent Contributor
Posts: 96

Re: Flag records having a specific word even with alteration in spelling by one character

Hi Ksharp,

 

thanks for your incredible guidance. With your as well as Reeza's guidance, I have customized the sas code as:

 

proc sql;
create table test3 as 
select a.report, soundex(a.report) as value1, soundex(b.report) as value2
from test as a, test as b
where calculated value1 =*   calculated value2 ;
quit;

 

I am getting this:

Report                                                                                            value1                                                           value2
STRICTURE IN THE GASTRODUODENAL JUNCTION                 S36236532236335425235              S36236532236335425235
POLYP WITHOUT DYSPLASIA AT FIRST PART OF DUODENUM P4133321423162316313355          P4133321423162316313355
MULTIPLE ULCERS PRESENT IN THE DUODENOM                    M4314426216253533355                M4314426216253533355
ULCER OVER DUODENOL MUCOSA                                            U426163354522                              U426163354522
NOTHING ABONRMALITY FOUND IN THE DUODENUM              N352156543153533355                  N352156543153533355

 

 

Can you guide me further from this step. My expectation is to get a list of terms related to DUODENUM

  • DUODENUM 
  • DUODENOM                    
  • DUODENOL

Thank you in advance for your kind reply. 

 

Regards,

Deepak

 

 

 

 

 

Swain
Ask a Question
Discussion stats
  • 4 replies
  • 336 views
  • 4 likes
  • 3 in conversation