Regex

Accepted Solution Solved
Reply
Super Contributor
Posts: 642
Accepted Solution

Regex

alive
is the subject alive
died
died at hospice

 

I would want to get only those records that contain only "alive" and "dead" as values. for instance row 1 and 3 are records to output. how do we get this using regex?


Accepted Solutions
Solution
‎05-12-2017 09:27 AM
Grand Advisor
Posts: 9,576

Re: Regex

IF PRXMATCH("/\b(died|alive)\b/oi", a) ;

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Regex

Why a regex, there is no pattern matching simply:

if upcase(variable) in ("ALIVE","DEAD") then ...

However note that died != dead, even though we can understand the two are the same, there is no logical way the computer can know.

Super Contributor
Posts: 642

Re: Regex

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Regex

Sorry, still don't see it.  Regex is for pattern matching, it wont really help you find words from a list. If there are lots then put them in a dataset thus:

data words;
  word="ALIVE"; output;
  word="DEAD"; output;
run;

proc sql;
  select  *
  from    HAVE
  where  YOUR_VARIABLE in (select * from WORDS);
quit;

Or you could do it in a format, or code generation, or by merging, but I don't see regex doing it.

Grand Advisor
Posts: 10,210

Re: Regex


SASPhile wrote:

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex


Your original post says:

"I would want to get only those records that contain only "alive" and "dead" as values"

 

That is exactly one word and confirmed by statement that rows 1 and 3 match (although whether you meant "died" instead of "dead" does come up).

With "only "alive" and "dead" " what does is matter if there are hundreds of words? You said "only".

It seems like you may not have clearly stated the problem or the desired result.

 

Regular Contributor
Posts: 234

Re: Regex


SASPhile wrote:

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex


Do you know the required changes right now?

 

If not, why use regex? With "hunderds of words" the expression will be difficult to read. Using a format or the suggest proc sql is imho the smarter solution.

 

Regular Contributor
Posts: 228

Re: Regex

[ Edited ]

something like this in Regex should work. But what RW9 Suggests is the right way to do as it is much cleaner and very easily mangeable

 

data a;

input a $50.;

datalines;

alive

is the subject alive

died

died at hospice

Nothing happened

;

data c;

set a;

IF PRXMATCH("m/.*died.*|.*alive.*/oi", a) > 0 ;

run;

 

just add any word like .*wordyouwant.* after the pipe i.e. |. Here dot indicates any character and * means it can be there 0 or more times.

 

 

see this link for more explanation

 

http://support.sas.com/kb/38/719.html

Super Contributor
Posts: 642

Re: Regex

Tried this and worked out well!

^died\s*$|^alive\s*$/i

 

Regular Contributor
Posts: 228

Re: Regex

[ Edited ]

oh I think i did not read your question well. you want those words in the begining.

 

^died\s*$|^alive\s*$/i.

this will not catch if your variable value is like  "alive is good".  It can search for your find word with no space, single space or multiple spaces

 

 

if you try the below one just in case to be safe aand this takes care of your word at beginning+space after that+ any other other word(may or may not be there)

PRXMATCH("m/^died\s+.*|^alive\s+.*/i", a) > 0

Super Contributor
Posts: 642

Re: Regex

I want only those records contain just "died" and "alive"

Regular Contributor
Posts: 228

Re: Regex

for that your solution will work. you can also try this

PRXMATCH("m/^died$|^alive$/i", trim(a)) > 0

Valued Guide
Posts: 2,174

Re: Regex

I imagine a table of the words required with one row for each word string
Proc sql ;
Create table required_subset as
Select distinct a.*
From what.you_have a
Join imagined_words b
On a.description_variable contains trim( b.word_string )
;
quit ;
I used DISTINCT expecting there _might_ be multiple matches in some cases
Solution
‎05-12-2017 09:27 AM
Grand Advisor
Posts: 9,576

Re: Regex

IF PRXMATCH("/\b(died|alive)\b/oi", a) ;

Super Contributor
Posts: 642

Re: Regex

Hi ,

 if there is a special character in the string and  i want to match it as part of the string,how to include in expression?

 

if var1 has the following values:

var1

NA

N-A

N/A

 

I would want  to match NA and N/A only

Regular Contributor
Posts: 228

Re: Regex

something like this should work

 

'm/NA|N\/A/oi'

 

when you have N/A, you need to escape / by adding \ as / has different meaning

 

data abc;

input var1 $;

datalines;

NA

N-A

N/A

;

data abcd(drop=pat);

set abc;

pat =prxparse('m/NA|N\/A/oi');

if prxmatch(pat, var1)>0 then newval ='yes';

else newval ='no';

run;

 

 

for your query all you need is

pat =prxparse('m/NA|N\/A/oi');

if prxmatch(pat, var1)>0 ;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 241 views
  • 6 likes
  • 7 in conversation