DATA Step, Macro, Functions and more

Flagging pre-negation and post-negation phrase

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 96
Accepted Solution

Flagging pre-negation and post-negation phrase

Hi tjhere,

For your kind information, I am trying to exclusion criteria for colon cases by flagging "Pre-negation phrase" and "Post negation" phrase. 

For example, report having the statement "NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON" will be flagged as "NO_CARCINOMA=1" as "no evidence of" is present  either before or after the term colon within 10 words. As my reprts may contains multiple lines, I want to specify the word distance. Can somebody help me to write a code to flag it.

First I want to identify the term "COLON" with the report and then within 10 words on either side of it, I want to identify presence of the negation term "NO EVIDENCE OF CARCINOMA ".  

Sample of data:

data test;

  length text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

Thank you in advance for your kind guidance. 

Regards,
Deepak

 

Swain

Accepted Solutions
Solution
‎08-15-2016 09:28 AM
Respected Advisor
Posts: 4,663

Re: Flagging pre-negation and post-negation phrase

[ Edited ]

Better do it with pattern matching :

 

data test;

  length id 8 text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

data check;
if not prx1 then 
    prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;

proc sql; select * from check; quit;
PG

View solution in original post


All Replies
Contributor
Posts: 22

Re: Flagging pre-negation and post-negation phrase

[ Edited ]

The approach I would take would be to divide the text into an array of words using the SCAN function, then look for the specific words you are looking for at the specific places, along the lines of the code shown below. To explain the code, the two PROXIMITY variables limit how far apart the words COLON and CARCINOMA can be; the MIN and MAX operators ensure that array index values are within the range of the array; the CONTINUE statement stops looking at a specific location if the exact phrase NO EVIDENCE OF CARCINOMA is not found there.

 

data result (keep=id text NO_CARCINOMA);
   set test;
   retain proximity_left 10 proximity_right 13;
   array word{80} $ 16 _temporary_;
   do i = 1 to 80;
      word{i} = scan(text, i);
      end;
   NO_CARCINOMA = 0;
   do i = 1 to 80;
      if word{i} = 'COLON' then
          do j = 1 max (i - proximity_left) to (i + proximity_right) min 80;
         if word{j} ne 'CARCINOMA' then continue;
         if word{1 max (j - 3) min 80} ne 'NO' or
             word{1 max (j - 2) min 80} ne 'EVIDENCE' or
             word{1 max (j - 1) min 80} ne 'OF' then continue;
         NO_CARCINOMA = 1;
         leave;
         end;
      end;
run;

 

 

Frequent Contributor
Posts: 96

Re: Flagging pre-negation and post-negation phrase

Hi RickAster,

Kindly accept my apology for the delayed reply. The solutuion provided by you is answering my issue accurately. 

 

Although for the current issue , I am going with pattern matching, but I will keep your sas coding using array for the future to deal with more complex needs because your code is flexible enough to search multiple words having variable word distance. 

 

Once again thank you for your kind guidance. 

 

Regards,

Deepak

Swain
Solution
‎08-15-2016 09:28 AM
Respected Advisor
Posts: 4,663

Re: Flagging pre-negation and post-negation phrase

[ Edited ]

Better do it with pattern matching :

 

data test;

  length id 8 text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

data check;
if not prx1 then 
    prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;

proc sql; select * from check; quit;
PG
Frequent Contributor
Posts: 96

Re: Flagging pre-negation and post-negation phrase

Hi PGStats,

 

Accept my apology for late reply. The advice given by you is accurately anaswering my needs.The sas code provided is very simple and easy to apply in different scenario in the future too. 

 

Regards,

Deepak

Swain
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 258 views
  • 2 likes
  • 3 in conversation