BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
any help on regex?

12 REPLIES 12
SASPhile
Quartz | Level 8


did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
the output records are:
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

any help on regex?

FriedEgg
SAS Employee
data f;
length str $ 64;
array x[2] $  8 _temporary_ ('subject' 'complete');
array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
input str &;
j=0;
do i=1 to dim(x);
  j+findw(str,x[i],,'tsmi')>0;
end;
if j=dim(x) and prxmatch('/\b('||catx('|',of y[*])||')\b/i',str)>0;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;
SASPhile
Quartz | Level 8

Hi,

   The final output should contain only these three records as I'm looking for combination of subject complete along with either study,treatment or protocol

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

 

Reeza
Super User

PLEASE edit your posts, don't post duplicates.

art297
Opal | Level 21

Matt's (@FriedEgg) code did that, but showed the results in the log rather than the file. A minor change to his code puts the results in a file:

 

data want;
  length str $ 64;
  array x[2] $  8 _temporary_ ('subject' 'complete');
  array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
  input str &;
  j=0;
  do i=1 to dim(x);
    j+findw(str,x[i],,'tsmi')>0;
  end;
  if j=dim(x) and prxmatch('/('||catx('|',of y[*])||')/i',str)>0;
  cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

Art, CEO, AnalystFinder.com

FriedEgg
SAS Employee
Updated my post
ChrisBrooks
Ammonite | Level 13

Try this

 

data in;
length str $64;
input str &;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

data out;
	set in;
	if prxmatch("/\bsubject\b.*\bcomplete\b.*(\bstudy\b|\btreatment\b|\bprotocol\b)/",str)
then output; run;

The \b characters force the regex to match only on word boundaries and \b.* means their can be multiple words between the matching ones.

FriedEgg
SAS Employee
it can be a bit simplified:

/\bsubject\b.*\bcomplete\b.*\b(study|treatment|protocol)\b/i

the reason I chose not to do this was to avoid the assumed logic that the words would always be in this order.
ChrisBrooks
Ammonite | Level 13

That's a good point about word order - you could probably do something with capture groups if they could appear in any order.

SASPhile
Quartz | Level 8

Thanks! and any words we dont want, can be separated out like [^do not include] , right?

FriedEgg
SAS Employee
[^] creates a negated set, it is specific to the characters within the set, no strings.
SASPhile
Quartz | Level 8

to exlcude the list then itset prxmatch(list1|list2!..columnname)=0 must work 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1913 views
  • 2 likes
  • 5 in conversation