Hi all,
I'm attempting to locate & output a date using some Regex that follows a keyword. The keyword in my scenario is 'Addendum' and it can occur multiple times. I'm looking for these keyword(s) in some free text that is loaded with line feeds/carriage returns etc. However I've worked on some Regex code that is successful - to a point.
I'm using PRXNEXT to capture every instance of 'Addendum', and then up to the next 15 words/non-word, and followed by a date.
if _n_ = 1 then do;
retain dt_pattern;
dt_pattern = prxparse("/(addend\w+(\W+\w+){0,15})\W+(\d{1,2}\s?(\.|\/|-)\s?\d{1,2}\s?(\.|\/|-)\s?\d{2,4})/i");
end;
start = 1;
stop = length(imp_rep_concat);
call prxnext(dt_pattern,start,stop,imp_rep_concat,pos,len);
array comm[8] $150 addend1-addend8;
array comm1[8] $30 amend_out1-amend_out8;
do i = 1 to 8 while (pos > 0);
comm(i) = upcase(substr(imp_rep_concat,pos,len));
comm1(i) = prxPosn(dt_pattern, 3, imp_rep_concat);
call prxnext(dt_pattern,start,stop,imp_rep_concat,pos,len);
end;
However, in the example attached, there are 3 instances of 'Addendum' and this code is only picking up two. I've tried adjusting the (\W+\w+){0,15})
in the code to expand it from 15 but then it subsumes the next date. Any ideas/advice? Thank you.
Sample is saved here as well:
https://regex101.com/r/0eWTV9/1
how about not using regexp?
Bart
data want;
x = "
ASSESSMENT:ACR BI-RADS Category 0 - Incomplete: Need additional
imaging evaluation.
RECOMMENDATION:
1: Additional imaging Right
2: Additional imaging
COMMENTS: Spot magnification views are recommended on the right.
Jane Doe
11/14/2017 10:08 AM
Addendum:
ASSESSMENT: ACR BI-RADS 2 - Benign.
RECOMMENDATION:
1: Routine screening mammogram Bilateral in 1 Year
COMMENTS:
Jane Doe
11/22/2017 1:46 PM
REPORT_TEXT EXAM: BREAST TOMOSYNTHESIS BI
ACCESSION: xxxxxxxxxxx
EXAM DATE AND TIME: 11/13/2017 11:09 AM
COMPARISON: Prior mammograms dated 2/19/2014.
TISSUE DENSITY: The breast tissue is heterogeneously dense (51% -
75%)
FINDINGS:
Routine tomographic views were obtained bilaterally with 2-D
reconstructions. CAD was used for analysis.
Left Breast: There are no findings suspicious for malignancy.
There is no significant change from prior exam.
Right Breast: There is been interval development of calcification
in the upper outer quadrant of the right breast approximately 5
cm deep to the nipple, 3 cm lateral to the nipple, and 2 cm above
the nipple.
Addendum: EXAM: BREAST TOMOSYNTHESIS BI
ACCESSION: xxxxxxx
EXAM DATE AND TIME: 11/13/2017 11:09 AM
ADDENDUM: Prior mammographic studies dated 11/15/2016 and
10/11/2016 have now become available for comparison.
The calcifications in the upper outer quadrant of the right
breast are visualized on these prior studies including
magnification views and have shown no interval change. These
appear benign.
"
;
n = countw(x," :;,.-","SM");
put n=;
length w $ 50;
keep a i d;
format d yymmdd10.;
do i = 1 to n;
w = scan(x,i," :;,.-","SM");
if index(upcase(w),"ADDENDUM") then a=i; /* mark if ADDENDUM found */
d = input(w, ?? mmddyy10.);
/*put i a d w;*/
if d and a then do;
output;
a=0;
d=.;
end;
end;
run;
how about not using regexp?
Bart
data want;
x = "
ASSESSMENT:ACR BI-RADS Category 0 - Incomplete: Need additional
imaging evaluation.
RECOMMENDATION:
1: Additional imaging Right
2: Additional imaging
COMMENTS: Spot magnification views are recommended on the right.
Jane Doe
11/14/2017 10:08 AM
Addendum:
ASSESSMENT: ACR BI-RADS 2 - Benign.
RECOMMENDATION:
1: Routine screening mammogram Bilateral in 1 Year
COMMENTS:
Jane Doe
11/22/2017 1:46 PM
REPORT_TEXT EXAM: BREAST TOMOSYNTHESIS BI
ACCESSION: xxxxxxxxxxx
EXAM DATE AND TIME: 11/13/2017 11:09 AM
COMPARISON: Prior mammograms dated 2/19/2014.
TISSUE DENSITY: The breast tissue is heterogeneously dense (51% -
75%)
FINDINGS:
Routine tomographic views were obtained bilaterally with 2-D
reconstructions. CAD was used for analysis.
Left Breast: There are no findings suspicious for malignancy.
There is no significant change from prior exam.
Right Breast: There is been interval development of calcification
in the upper outer quadrant of the right breast approximately 5
cm deep to the nipple, 3 cm lateral to the nipple, and 2 cm above
the nipple.
Addendum: EXAM: BREAST TOMOSYNTHESIS BI
ACCESSION: xxxxxxx
EXAM DATE AND TIME: 11/13/2017 11:09 AM
ADDENDUM: Prior mammographic studies dated 11/15/2016 and
10/11/2016 have now become available for comparison.
The calcifications in the upper outer quadrant of the right
breast are visualized on these prior studies including
magnification views and have shown no interval change. These
appear benign.
"
;
n = countw(x," :;,.-","SM");
put n=;
length w $ 50;
keep a i d;
format d yymmdd10.;
do i = 1 to n;
w = scan(x,i," :;,.-","SM");
if index(upcase(w),"ADDENDUM") then a=i; /* mark if ADDENDUM found */
d = input(w, ?? mmddyy10.);
/*put i a d w;*/
if d and a then do;
output;
a=0;
d=.;
end;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.