BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nat4
Fluorite | Level 6

Hi everyone,

 

If I have a dataset that looks like the following (with ~5,000 rows). The VariableList contains strings with certain prefix ("string*") as well as other numbers and characters, always separated by space.

     

IDVariableList
A1stringA stringB 123
A2stringA + stringB > stringC
A3stringA 
A4stringB stringC 456
A5stringC stringD abc

 

And I want to search for units where in the VariableList, the words with prefix include words from a list ("stringA" or "stringB"), but no other prefixed words ("StringC", "StringD", etc). I don't care about the extra strings.

So in this example, I want the new dataset to only contain 

   

IDVariableList
A1stringA stringB 123
A3stringA 

 

I found a similar question (https://communities.sas.com/t5/SAS-Procedures/searching-words-in-a-variable/td-p/85846) and tried the PRX function as follows:

 

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

 

%let matchString=stringA|stringB;


data want;
   set have;
   where prxmatch("/\b(&matchString.)\b/", VariableList);
run;

 

But this would find all the rows that contain stringA and stringB, including the ones that I don't want (A2 & A4). How do I go about excluding them?

Thanks a lot!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;

View solution in original post

5 REPLIES 5
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
Nat4
Fluorite | Level 6

Thanks! Wow this is exactly what I was looking for.

 

Can I ask a follow up question? I forgot to mention that "stringA|stringB" is a dynamic list that I read from a metadata file, and the A and B part are not necessarily single letters, say what I wanted to match was actually "stringA13 & string BC9". I think the brackets [...] only work for characters?

 

So I wanted something similar to

 

if prxmatch('/\bstring[^(A13+BC9)]+\b/i',VariableList) then delete;

This seems more complicated than what I thought.

Ksharp
Super User
/*
OK. Check this one .
*/

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA13 stringBC9 123
A2|stringA13 + stringBC9 > stringC
A3|stringA13 
A4|stringBC9 stringC 456
A5|stringC stringD abc
A6|stringBC8 stringD abc
A7|stringBC8 stringA2 abc
A8|stringA13 stringBC9
;

data want;
 set have; 
 if prxmatch('/string(?!A13|BC9)/i',VariableList) then delete;
run;
PGStats
Opal | Level 21

Try this:

 

%let goodPrefix=stringA|stringB;
%let badPrefix=stringC|stringD;

/* Or perhaps */

%let goodPrefix=string[AB];
%let badPrefix=string[^AB];

data want;
set have;
if prxmatch("/^(\s*\b(&goodPrefix.))+/io", variableList) and not
    prxmatch("/\b(&badPrefix.)/io", variableList);
run;
PG
Nat4
Fluorite | Level 6

Thanks!! I like how you listed both the good and bad prefixes, and defined the badPrefixes as "not good" -- this is what I needed as I don't know what to exclude beforehand.

 

Sorry that I set the other earlier reply as solution. This is a good one too!

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1274 views
  • 4 likes
  • 3 in conversation