DATA Step, Macro, Functions and more

SAS regex

Reply
Super Contributor
Posts: 648

SAS regex

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
any help on regex?

Super Contributor
Posts: 648

regex


did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form


There is a text field with above text. I would like to select only those records which contain the key words "subject" "complete" along with "study" or "treatment" or "protocol".
the output records are:
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

any help on regex?

Trusted Advisor
Posts: 1,300

Re: regex

[ Edited ]
data f;
length str $ 64;
array x[2] $  8 _temporary_ ('subject' 'complete');
array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
input str &;
j=0;
do i=1 to dim(x);
  j+findw(str,x[i],,'tsmi')>0;
end;
if j=dim(x) and prxmatch('/\b('||catx('|',of y[*])||')\b/i',str)>0;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;
Super Contributor
Posts: 648

Re: regex

Hi,

   The final output should contain only these three records as I'm looking for combination of subject complete along with either study,treatment or protocol

did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol

 

Super User
Posts: 17,963

Re: SAS regex

PLEASE edit your posts, don't post duplicates.

PROC Star
Posts: 7,366

Re: SAS regex

Matt's (@FriedEgg) code did that, but showed the results in the log rather than the file. A minor change to his code puts the results in a file:

 

data want;
  length str $ 64;
  array x[2] $  8 _temporary_ ('subject' 'complete');
  array y[3] $  9 _temporary_ ('study' 'treatment' 'protocol');
  input str &;
  j=0;
  do i=1 to dim(x);
    j+findw(str,x[i],,'tsmi')>0;
  end;
  if j=dim(x) and prxmatch('/('||catx('|',of y[*])||')/i',str)>0;
  cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

Art, CEO, AnalystFinder.com

Trusted Advisor
Posts: 1,300

Re: SAS regex

Updated my post
Regular Contributor
Posts: 190

Re: SAS regex

[ Edited ]

Try this

 

data in;
length str $64;
input str &;
cards;
did the subject complete the entire course of study
did the subject complete the treatment period
did subject complete full study protocol
did subject complete hospital visit
did subject complete application form
;
run;

data out;
	set in;
	if prxmatch("/\bsubject\b.*\bcomplete\b.*(\bstudy\b|\btreatment\b|\bprotocol\b)/",str)
then output; run;

The \b characters force the regex to match only on word boundaries and \b.* means their can be multiple words between the matching ones.

Trusted Advisor
Posts: 1,300

Re: SAS regex

it can be a bit simplified:

/\bsubject\b.*\bcomplete\b.*\b(study|treatment|protocol)\b/i

the reason I chose not to do this was to avoid the assumed logic that the words would always be in this order.
Regular Contributor
Posts: 190

Re: SAS regex

[ Edited ]

That's a good point about word order - you could probably do something with capture groups if they could appear in any order.

Super Contributor
Posts: 648

Re: SAS regex

Thanks! and any words we dont want, can be separated out like [^do not include] , right?

Trusted Advisor
Posts: 1,300

Re: SAS regex

[^] creates a negated set, it is specific to the characters within the set, no strings.
Super Contributor
Posts: 648

Re: SAS regex

to exlcude the list then itset prxmatch(list1|list2!..columnname)=0 must work 

Ask a Question
Discussion stats
  • 12 replies
  • 262 views
  • 2 likes
  • 5 in conversation