BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
any help on regex?

12 REPLIES 12
SASPhile
Quartz | Level 8


did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
the output records are:
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

any help on regex?

FriedEgg
SAS Employee
data f;
length str $ 64;
array x[2] $  8 _temporary_ ('subject' 'complete');
array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
input str &;
j=0;
do i=1 to dim(x);
  j+findw(str,x[i],,'tsmi')>0;
end;
if j=dim(x) and prxmatch('/\b('||catx('|',of y[*])||')\b/i',str)>0;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;
SASPhile
Quartz | Level 8

Hi,

   The final output should contain only these three records as I'm looking for combination of subject complete along with either study,treatment or protocol

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

 

Reeza
Super User

PLEASE edit your posts, don't post duplicates.

art297
Opal | Level 21

Matt's (@FriedEgg) code did that, but showed the results in the log rather than the file. A minor change to his code puts the results in a file:

 

data want;
  length str $ 64;
  array x[2] $  8 _temporary_ ('subject' 'complete');
  array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
  input str &;
  j=0;
  do i=1 to dim(x);
    j+findw(str,x[i],,'tsmi')>0;
  end;
  if j=dim(x) and prxmatch('/('||catx('|',of y[*])||')/i',str)>0;
  cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

Art, CEO, AnalystFinder.com

FriedEgg
SAS Employee
Updated my post
ChrisBrooks
Ammonite | Level 13

Try this

 

data in;
length str $64;
input str &;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

data out;
	set in;
	if prxmatch("/\bsubject\b.*\bcomplete\b.*(\bstudy\b|\btreatment\b|\bprotocol\b)/",str)
then output; run;

The \b characters force the regex to match only on word boundaries and \b.* means their can be multiple words between the matching ones.

FriedEgg
SAS Employee
it can be a bit simplified:

/\bsubject\b.*\bcomplete\b.*\b(study|treatment|protocol)\b/i

the reason I chose not to do this was to avoid the assumed logic that the words would always be in this order.
ChrisBrooks
Ammonite | Level 13

That's a good point about word order - you could probably do something with capture groups if they could appear in any order.

SASPhile
Quartz | Level 8

Thanks! and any words we dont want, can be separated out like [^do not include] , right?

FriedEgg
SAS Employee
[^] creates a negated set, it is specific to the characters within the set, no strings.
SASPhile
Quartz | Level 8

to exlcude the list then itset prxmatch(list1|list2!..columnname)=0 must work 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1338 views
  • 2 likes
  • 5 in conversation