Hi tjhere,
For your kind information, I am trying to exclusion criteria for colon cases by flagging "Pre-negation phrase" and "Post negation" phrase.
For example, report having the statement "NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON" will be flagged as "NO_CARCINOMA=1" as "no evidence of" is present either before or after the term colon within 10 words. As my reprts may contains multiple lines, I want to specify the word distance. Can somebody help me to write a code to flag it.
First I want to identify the term "COLON" with the report and then within 10 words on either side of it, I want to identify presence of the negation term "NO EVIDENCE OF CARCINOMA ".
Sample of data:
data test;
length text $200;
id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
run;
Thank you in advance for your kind guidance.
Regards,
Deepak
Better do it with pattern matching :
data test;
length id 8 text $200;
id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
run;
data check;
if not prx1 then
prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;
proc sql; select * from check; quit;
The approach I would take would be to divide the text into an array of words using the SCAN function, then look for the specific words you are looking for at the specific places, along the lines of the code shown below. To explain the code, the two PROXIMITY variables limit how far apart the words COLON and CARCINOMA can be; the MIN and MAX operators ensure that array index values are within the range of the array; the CONTINUE statement stops looking at a specific location if the exact phrase NO EVIDENCE OF CARCINOMA is not found there.
data result (keep=id text NO_CARCINOMA);
set test;
retain proximity_left 10 proximity_right 13;
array word{80} $ 16 _temporary_;
do i = 1 to 80;
word{i} = scan(text, i);
end;
NO_CARCINOMA = 0;
do i = 1 to 80;
if word{i} = 'COLON' then
do j = 1 max (i - proximity_left) to (i + proximity_right) min 80;
if word{j} ne 'CARCINOMA' then continue;
if word{1 max (j - 3) min 80} ne 'NO' or
word{1 max (j - 2) min 80} ne 'EVIDENCE' or
word{1 max (j - 1) min 80} ne 'OF' then continue;
NO_CARCINOMA = 1;
leave;
end;
end;
run;
Hi RickAster,
Kindly accept my apology for the delayed reply. The solutuion provided by you is answering my issue accurately.
Although for the current issue , I am going with pattern matching, but I will keep your sas coding using array for the future to deal with more complex needs because your code is flexible enough to search multiple words having variable word distance.
Once again thank you for your kind guidance.
Regards,
Deepak
Better do it with pattern matching :
data test;
length id 8 text $200;
id=1; text="NO EVIDENCE OF CARCINOMA IS SEEN IN THE COLON."; output;
id=2; text="SPECIMEN OF COLON SHOWS NO EVIDENCE OF CARCINOMA BUT THERE IS SOME DYSPLASIC CHANGES"; output;
id=3; text="ASCENDING COLON IS HAVING DEFINITIVE EVIDENCE OF CARCINOMA "; output;
id=4; text="SIGMOID COLON IS HAVING MULTIPLE POLYP AS WELL AS FIRBROSIS. SPECIMEN OF LIVER IS ALSO HAVING NO EVIDENCE OF CARCINOMA "; output;
run;
data check;
if not prx1 then
prx1 + prxParse("/COLON(\W+\w+){1,10}\W+NO EVIDENCE OF CARCINOMA|NO EVIDENCE OF CARCINOMA(\W+\w+){1,10}\W+COLON/i");
set test;
NO_CARCINOMA = prxMatch(prx1, text) > 0;
drop prx1;
run;
proc sql; select * from check; quit;
Hi PGStats,
Accept my apology for late reply. The advice given by you is accurately anaswering my needs.The sas code provided is very simple and easy to apply in different scenario in the future too.
Regards,
Deepak
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.