06-17-2016 02:46 PM
I am trying to identify records having a specific word as part of the record. There may be spelling error with alteration in 1-2 characters. A sample dataset is attached for your kind cosnideration. I am trying to flag records having the word "DUODENUM" as well as "DUODENAL" and "DUODENOL" without hard coding.
I am not sure whether spedic function or soundex function will be of help. To begin with, I used spedis function but I realized that it taking into consideration of the whole statement and the word itself. So I don't know how to apply the matching crieria for alteration in 1-2 characters in the word itself. Sas code used:
data test1; set test; if find(report, 'DUODENUM') then value1=spedis(report, 'DUODENUM'); if value1 ne .; run; proc sort data =test1(obs=1) out =test2 ; by value1 ;run; proc sql; create table test3 as select id,report, (select value1 from test2) as value1, spedis(report, 'DUODENUM') as value2 from test where (calculated value2 - calculated value1) le 2 ; quit;
later I tried the follwoing code as adviced by Ksharp:
proc sql; create table test3 as select value1, value2 from test as a, test as b where value1 =* value2 ; quit;
I got following message in the log: ERROR: The following columns were not found in the contributing tables: value1, value2. Being novice in SAS, I am not understanding the logic provided by Ksharp. Hence I did not altered the code to generate value 1 and value 2.
Can somebody help me further with this.
Thank you in advance for your kind support.
06-17-2016 02:56 PM
You need to replace value1 and value2 variable references with the names of your actual variables.
It's not a good idea to run code you don't understand, so I'd start by trying to understand the code.
Review the code and highlight the sections you don't understand and someone can help clarify. One good way to do this is to comment the code. Add comments for every proc/data step and even sections of them to highlight what they're doing. If you ever need to use this code again you'll probably be happy you did this.
Quick comments regarding @Ksharp's code, it's what's called a self join - you're joining each record in your dataset to every other record on the condition in the where clause (WHERE).
=* is the SOUNDS LIKE operator, or SOUNDEX
06-18-2016 02:57 AM
proc sql; create table test3 as select a.value1, b.value2 from test as a, test as b where a.value1 =* b.value2 ; quit;
06-20-2016 10:56 AM
thanks for your incredible guidance. With your as well as Reeza's guidance, I have customized the sas code as:
proc sql; create table test3 as select a.report, soundex(a.report) as value1, soundex(b.report) as value2 from test as a, test as b where calculated value1 =* calculated value2 ; quit;
I am getting this:
Report value1 value2
STRICTURE IN THE GASTRODUODENAL JUNCTION S36236532236335425235 S36236532236335425235
POLYP WITHOUT DYSPLASIA AT FIRST PART OF DUODENUM P4133321423162316313355 P4133321423162316313355
MULTIPLE ULCERS PRESENT IN THE DUODENOM M4314426216253533355 M4314426216253533355
ULCER OVER DUODENOL MUCOSA U426163354522 U426163354522
NOTHING ABONRMALITY FOUND IN THE DUODENUM N352156543153533355 N352156543153533355
Can you guide me further from this step. My expectation is to get a list of terms related to DUODENUM
Thank you in advance for your kind reply.