BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASPhile
Quartz | Level 8
alive
is the subject alive
died
died at hospice

 

I would want to get only those records that contain only "alive" and "dead" as values. for instance row 1 and 3 are records to output. how do we get this using regex?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

IF PRXMATCH("/\b(died|alive)\b/oi", a) ;

View solution in original post

15 REPLIES 15
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why a regex, there is no pattern matching simply:

if upcase(variable) in ("ALIVE","DEAD") then ...

However note that died != dead, even though we can understand the two are the same, there is no logical way the computer can know.

SASPhile
Quartz | Level 8

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, still don't see it.  Regex is for pattern matching, it wont really help you find words from a list. If there are lots then put them in a dataset thus:

data words;
  word="ALIVE"; output;
  word="DEAD"; output;
run;

proc sql;
  select  *
  from    HAVE
  where  YOUR_VARIABLE in (select * from WORDS);
quit;

Or you could do it in a format, or code generation, or by merging, but I don't see regex doing it.

ballardw
Super User

@SASPhile wrote:

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex


Your original post says:

"I would want to get only those records that contain only "alive" and "dead" as values"

 

That is exactly one word and confirmed by statement that rows 1 and 3 match (although whether you meant "died" instead of "dead" does come up).

With "only "alive" and "dead" " what does is matter if there are hundreds of words? You said "only".

It seems like you may not have clearly stated the problem or the desired result.

 

andreas_lds
Jade | Level 19

@SASPhile wrote:

there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex


Do you know the required changes right now?

 

If not, why use regex? With "hunderds of words" the expression will be difficult to read. Using a format or the suggest proc sql is imho the smarter solution.

 

kiranv_
Rhodochrosite | Level 12

something like this in Regex should work. But what RW9 Suggests is the right way to do as it is much cleaner and very easily mangeable

 

data a;

input a $50.;

datalines;

alive

is the subject alive

died

died at hospice

Nothing happened

;

data c;

set a;

IF PRXMATCH("m/.*died.*|.*alive.*/oi", a) > 0 ;

run;

 

just add any word like .*wordyouwant.* after the pipe i.e. |. Here dot indicates any character and * means it can be there 0 or more times.

 

 

see this link for more explanation

 

http://support.sas.com/kb/38/719.html

SASPhile
Quartz | Level 8

Tried this and worked out well!

^died\s*$|^alive\s*$/i

 

kiranv_
Rhodochrosite | Level 12

oh I think i did not read your question well. you want those words in the begining.

 

^died\s*$|^alive\s*$/i.

this will not catch if your variable value is like  "alive is good".  It can search for your find word with no space, single space or multiple spaces

 

 

if you try the below one just in case to be safe aand this takes care of your word at beginning+space after that+ any other other word(may or may not be there)

PRXMATCH("m/^died\s+.*|^alive\s+.*/i", a) > 0

SASPhile
Quartz | Level 8

I want only those records contain just "died" and "alive"

kiranv_
Rhodochrosite | Level 12

for that your solution will work. you can also try this

PRXMATCH("m/^died$|^alive$/i", trim(a)) > 0

Peter_C
Rhodochrosite | Level 12
I imagine a table of the words required with one row for each word string
Proc sql ;
Create table required_subset as
Select distinct a.*
From what.you_have a
Join imagined_words b
On a.description_variable contains trim( b.word_string )
;
quit ;
I used DISTINCT expecting there _might_ be multiple matches in some cases
Ksharp
Super User

IF PRXMATCH("/\b(died|alive)\b/oi", a) ;

SASPhile
Quartz | Level 8

Hi ,

 if there is a special character in the string and  i want to match it as part of the string,how to include in expression?

 

if var1 has the following values:

var1

NA

N-A

N/A

 

I would want  to match NA and N/A only

kiranv_
Rhodochrosite | Level 12

something like this should work

 

'm/NA|N\/A/oi'

 

when you have N/A, you need to escape / by adding \ as / has different meaning

 

data abc;

input var1 $;

datalines;

NA

N-A

N/A

;

data abcd(drop=pat);

set abc;

pat =prxparse('m/NA|N\/A/oi');

if prxmatch(pat, var1)>0 then newval ='yes';

else newval ='no';

run;

 

 

for your query all you need is

pat =prxparse('m/NA|N\/A/oi');

if prxmatch(pat, var1)>0 ;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2873 views
  • 6 likes
  • 7 in conversation