did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
any help on regex?
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
the output records are:
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
any help on regex?
data f;
length str $ 64;
array x[2] $ 8 _temporary_ ('subject' 'complete');
array y[3] $ 9 _temporary_ ('study' 'treatment' 'protocol');
input str &;
j=0;
do i=1 to dim(x);
j+findw(str,x[i],,'tsmi')>0;
end;
if j=dim(x) and prxmatch('/\b('||catx('|',of y[*])||')\b/i',str)>0;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;
Hi,
The final output should contain only these three records as I'm looking for combination of subject complete along with either study,treatment or protocol
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
PLEASE edit your posts, don't post duplicates.
Matt's (@FriedEgg) code did that, but showed the results in the log rather than the file. A minor change to his code puts the results in a file:
data want; length str $ 64; array x[2] $ 8 _temporary_ ('subject' 'complete'); array y[3] $ 9 _temporary_ ('study' 'treatment' 'protocol'); input str &; j=0; do i=1 to dim(x); j+findw(str,x[i],,'tsmi')>0; end; if j=dim(x) and prxmatch('/('||catx('|',of y[*])||')/i',str)>0; cards; did the subject complete the entire course of study did the subject complete the treatment period did subject complete full study protocol did subject complete hospital visit did subject complete application form ; run;
Art, CEO, AnalystFinder.com
Try this
data in; length str $64; input str &; cards; did the subject complete the entire course of study did the subject complete the treatment period did subject complete full study protocol did subject complete hospital visit did subject complete application form ; run; data out; set in; if prxmatch("/\bsubject\b.*\bcomplete\b.*(\bstudy\b|\btreatment\b|\bprotocol\b)/",str)
then output; run;
The \b characters force the regex to match only on word boundaries and \b.* means their can be multiple words between the matching ones.
That's a good point about word order - you could probably do something with capture groups if they could appear in any order.
Thanks! and any words we dont want, can be separated out like [^do not include] , right?
to exlcude the list then itset prxmatch(list1|list2!..columnname)=0 must work
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.