alive |
is the subject alive |
died |
died at hospice |
I would want to get only those records that contain only "alive" and "dead" as values. for instance row 1 and 3 are records to output. how do we get this using regex?
IF PRXMATCH("/\b(died|alive)\b/oi", a) ;
Why a regex, there is no pattern matching simply:
if upcase(variable) in ("ALIVE","DEAD") then ...
However note that died != dead, even though we can understand the two are the same, there is no logical way the computer can know.
there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex
Sorry, still don't see it. Regex is for pattern matching, it wont really help you find words from a list. If there are lots then put them in a dataset thus:
data words; word="ALIVE"; output; word="DEAD"; output; run; proc sql; select * from HAVE where YOUR_VARIABLE in (select * from WORDS); quit;
Or you could do it in a format, or code generation, or by merging, but I don't see regex doing it.
@SASPhile wrote:
there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex
Your original post says:
"I would want to get only those records that contain only "alive" and "dead" as values"
That is exactly one word and confirmed by statement that rows 1 and 3 match (although whether you meant "died" instead of "dead" does come up).
With "only "alive" and "dead" " what does is matter if there are hundreds of words? You said "only".
It seems like you may not have clearly stated the problem or the desired result.
@SASPhile wrote:
there are hunderds of words and later in time, the pattern matching may chnage and it is eaiser to have a regex
Do you know the required changes right now?
If not, why use regex? With "hunderds of words" the expression will be difficult to read. Using a format or the suggest proc sql is imho the smarter solution.
something like this in Regex should work. But what RW9 Suggests is the right way to do as it is much cleaner and very easily mangeable
data a;
input a $50.;
datalines;
alive
is the subject alive
died
died at hospice
Nothing happened
;
data c;
set a;
IF PRXMATCH("m/.*died.*|.*alive.*/oi", a) > 0 ;
run;
just add any word like .*wordyouwant.* after the pipe i.e. |. Here dot indicates any character and * means it can be there 0 or more times.
see this link for more explanation
Tried this and worked out well!
^died\s*$|^alive\s*$/i
oh I think i did not read your question well. you want those words in the begining.
^died\s*$|^alive\s*$/i.
this will not catch if your variable value is like "alive is good". It can search for your find word with no space, single space or multiple spaces
if you try the below one just in case to be safe aand this takes care of your word at beginning+space after that+ any other other word(may or may not be there)
PRXMATCH("m/^died\s+.*|^alive\s+.*/i", a) > 0
I want only those records contain just "died" and "alive"
for that your solution will work. you can also try this
PRXMATCH("m/^died$|^alive$/i", trim(a)) > 0
IF PRXMATCH("/\b(died|alive)\b/oi", a) ;
Hi ,
if there is a special character in the string and i want to match it as part of the string,how to include in expression?
if var1 has the following values:
var1
NA
N-A
N/A
I would want to match NA and N/A only
something like this should work
'm/NA|N\/A/oi'
when you have N/A, you need to escape / by adding \ as / has different meaning
data abc;
input var1 $;
datalines;
NA
N-A
N/A
;
data abcd(drop=pat);
set abc;
pat =prxparse('m/NA|N\/A/oi');
if prxmatch(pat, var1)>0 then newval ='yes';
else newval ='no';
run;
for your query all you need is
pat =prxparse('m/NA|N\/A/oi');
if prxmatch(pat, var1)>0 ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.