BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

Hi tjhere,

For your kind information, I am trying to exclusion criteria for colon cases by flagging "Pre-negation phrase" and "Post negation" phrase. 

For example, report having the statement "NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON" will be flagged as "NO_CARCINOMA=1" as "no evidence of" is present  either before or after the term colon within 10 words. As my reprts may contains multiple lines, I want to specify the word distance. Can somebody help me to write a code to flag it.

First I want to identify the term "COLON" with the report and then within 10 words on either side of it, I want to identify presence of the negation term "NO EVIDENCE OF CARCINOMA ".  

Sample of data:

data test;

  length text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

Thank you in advance for your kind guidance. 

Regards,
Deepak

 

Swain
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Better do it with pattern matching :

 

data test;

  length id 8 text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

data check;
if not prx1 then 
    prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;

proc sql; select * from check; quit;
PG

View solution in original post

4 REPLIES 4
RickAster
Obsidian | Level 7

The approach I would take would be to divide the text into an array of words using the SCAN function, then look for the specific words you are looking for at the specific places, along the lines of the code shown below. To explain the code, the two PROXIMITY variables limit how far apart the words COLON and CARCINOMA can be; the MIN and MAX operators ensure that array index values are within the range of the array; the CONTINUE statement stops looking at a specific location if the exact phrase NO EVIDENCE OF CARCINOMA is not found there.

 

data result (keep=id text NO_CARCINOMA);
   set test;
   retain proximity_left 10 proximity_right 13;
   array word{80} $ 16 _temporary_;
   do i = 1 to 80;
      word{i} = scan(text, i);
      end;
   NO_CARCINOMA = 0;
   do i = 1 to 80;
      if word{i} = 'COLON' then
          do j = 1 max (i - proximity_left) to (i + proximity_right) min 80;
         if word{j} ne 'CARCINOMA' then continue;
         if word{1 max (j - 3) min 80} ne 'NO' or
             word{1 max (j - 2) min 80} ne 'EVIDENCE' or
             word{1 max (j - 1) min 80} ne 'OF' then continue;
         NO_CARCINOMA = 1;
         leave;
         end;
      end;
run;

 

 

DeepakSwain
Pyrite | Level 9

Hi RickAster,

Kindly accept my apology for the delayed reply. The solutuion provided by you is answering my issue accurately. 

 

Although for the current issue , I am going with pattern matching, but I will keep your sas coding using array for the future to deal with more complex needs because your code is flexible enough to search multiple words having variable word distance. 

 

Once again thank you for your kind guidance. 

 

Regards,

Deepak

Swain
PGStats
Opal | Level 21

Better do it with pattern matching :

 

data test;

  length id 8 text $200;
  id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
  id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
  id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
  id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
 
run;

data check;
if not prx1 then 
    prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;

proc sql; select * from check; quit;
PG
DeepakSwain
Pyrite | Level 9

Hi PGStats,

 

Accept my apology for late reply. The advice given by you is accurately anaswering my needs.The sas code provided is very simple and easy to apply in different scenario in the future too. 

 

Regards,

Deepak

Swain

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1218 views
  • 2 likes
  • 3 in conversation