Hi everyone,
If I have a dataset that looks like the following (with ~5,000 rows). The VariableList contains strings with certain prefix ("string*") as well as other numbers and characters, always separated by space.
ID | VariableList |
A1 | stringA stringB 123 |
A2 | stringA + stringB > stringC |
A3 | stringA |
A4 | stringB stringC 456 |
A5 | stringC stringD abc |
And I want to search for units where in the VariableList, the words with prefix include words from a list ("stringA" or "stringB"), but no other prefixed words ("StringC", "StringD", etc). I don't care about the extra strings.
So in this example, I want the new dataset to only contain
ID | VariableList |
A1 | stringA stringB 123 |
A3 | stringA |
I found a similar question (https://communities.sas.com/t5/SAS-Procedures/searching-words-in-a-variable/td-p/85846) and tried the PRX function as follows:
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
%let matchString=stringA|stringB;
data want;
set have;
where prxmatch("/\b(&matchString.)\b/", VariableList);
run;
But this would find all the rows that contain stringA and stringB, including the ones that I don't want (A2 & A4). How do I go about excluding them?
Thanks a lot!
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
data want;
set have;
if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
data want;
set have;
if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
Thanks! Wow this is exactly what I was looking for.
Can I ask a follow up question? I forgot to mention that "stringA|stringB" is a dynamic list that I read from a metadata file, and the A and B part are not necessarily single letters, say what I wanted to match was actually "stringA13 & string BC9". I think the brackets [...] only work for characters?
So I wanted something similar to
if prxmatch('/\bstring[^(A13+BC9)]+\b/i',VariableList) then delete;
This seems more complicated than what I thought.
/*
OK. Check this one .
*/
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA13 stringBC9 123
A2|stringA13 + stringBC9 > stringC
A3|stringA13
A4|stringBC9 stringC 456
A5|stringC stringD abc
A6|stringBC8 stringD abc
A7|stringBC8 stringA2 abc
A8|stringA13 stringBC9
;
data want;
set have;
if prxmatch('/string(?!A13|BC9)/i',VariableList) then delete;
run;
Try this:
%let goodPrefix=stringA|stringB;
%let badPrefix=stringC|stringD;
/* Or perhaps */
%let goodPrefix=string[AB];
%let badPrefix=string[^AB];
data want;
set have;
if prxmatch("/^(\s*\b(&goodPrefix.))+/io", variableList) and not
prxmatch("/\b(&badPrefix.)/io", variableList);
run;
Thanks!! I like how you listed both the good and bad prefixes, and defined the badPrefixes as "not good" -- this is what I needed as I don't know what to exclude beforehand.
Sorry that I set the other earlier reply as solution. This is a good one too!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.