did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
any help on regex?
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
the output records are:
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
any help on regex?
data f;
length str $ 64;
array x[2] $ 8 _temporary_ ('subject' 'complete');
array y[3] $ 9 _temporary_ ('study' 'treatment' 'protocol');
input str &;
j=0;
do i=1 to dim(x);
j+findw(str,x[i],,'tsmi')>0;
end;
if j=dim(x) and prxmatch('/\b('||catx('|',of y[*])||')\b/i',str)>0;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;
Hi,
The final output should contain only these three records as I'm looking for combination of subject complete along with either study,treatment or protocol
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
PLEASE edit your posts, don't post duplicates.
Matt's (@FriedEgg) code did that, but showed the results in the log rather than the file. A minor change to his code puts the results in a file:
data want; length str $ 64; array x[2] $ 8 _temporary_ ('subject' 'complete'); array y[3] $ 9 _temporary_ ('study' 'treatment' 'protocol'); input str &; j=0; do i=1 to dim(x); j+findw(str,x[i],,'tsmi')>0; end; if j=dim(x) and prxmatch('/('||catx('|',of y[*])||')/i',str)>0; cards; did the subject complete the entire course of study did the subject complete the treatment period did subject complete full study protocol did subject complete hospital visit did subject complete application form ; run;
Art, CEO, AnalystFinder.com
Try this
data in; length str $64; input str &; cards; did the subject complete the entire course of study did the subject complete the treatment period did subject complete full study protocol did subject complete hospital visit did subject complete application form ; run; data out; set in; if prxmatch("/\bsubject\b.*\bcomplete\b.*(\bstudy\b|\btreatment\b|\bprotocol\b)/",str)
then output; run;
The \b characters force the regex to match only on word boundaries and \b.* means their can be multiple words between the matching ones.
That's a good point about word order - you could probably do something with capture groups if they could appear in any order.
Thanks! and any words we dont want, can be separated out like [^do not include] , right?
to exlcude the list then itset prxmatch(list1|list2!..columnname)=0 must work
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.