BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nat4
Fluorite | Level 6

Hi everyone,

 

If I have a dataset that looks like the following (with ~5,000 rows). The VariableList contains strings with certain prefix ("string*") as well as other numbers and characters, always separated by space.

     

IDVariableList
A1stringA stringB 123
A2stringA + stringB > stringC
A3stringA 
A4stringB stringC 456
A5stringC stringD abc

 

And I want to search for units where in the VariableList, the words with prefix include words from a list ("stringA" or "stringB"), but no other prefixed words ("StringC", "StringD", etc). I don't care about the extra strings.

So in this example, I want the new dataset to only contain 

   

IDVariableList
A1stringA stringB 123
A3stringA 

 

I found a similar question (https://communities.sas.com/t5/SAS-Procedures/searching-words-in-a-variable/td-p/85846) and tried the PRX function as follows:

 

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

 

%let matchString=stringA|stringB;


data want;
   set have;
   where prxmatch("/\b(&matchString.)\b/", VariableList);
run;

 

But this would find all the rows that contain stringA and stringB, including the ones that I don't want (A2 & A4). How do I go about excluding them?

Thanks a lot!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;

View solution in original post

5 REPLIES 5
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
Nat4
Fluorite | Level 6

Thanks! Wow this is exactly what I was looking for.

 

Can I ask a follow up question? I forgot to mention that "stringA|stringB" is a dynamic list that I read from a metadata file, and the A and B part are not necessarily single letters, say what I wanted to match was actually "stringA13 & string BC9". I think the brackets [...] only work for characters?

 

So I wanted something similar to

 

if prxmatch('/\bstring[^(A13+BC9)]+\b/i',VariableList) then delete;

This seems more complicated than what I thought.

Ksharp
Super User
/*
OK. Check this one .
*/

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA13 stringBC9 123
A2|stringA13 + stringBC9 > stringC
A3|stringA13 
A4|stringBC9 stringC 456
A5|stringC stringD abc
A6|stringBC8 stringD abc
A7|stringBC8 stringA2 abc
A8|stringA13 stringBC9
;

data want;
 set have; 
 if prxmatch('/string(?!A13|BC9)/i',VariableList) then delete;
run;
PGStats
Opal | Level 21

Try this:

 

%let goodPrefix=stringA|stringB;
%let badPrefix=stringC|stringD;

/* Or perhaps */

%let goodPrefix=string[AB];
%let badPrefix=string[^AB];

data want;
set have;
if prxmatch("/^(\s*\b(&goodPrefix.))+/io", variableList) and not
    prxmatch("/\b(&badPrefix.)/io", variableList);
run;
PG
Nat4
Fluorite | Level 6

Thanks!! I like how you listed both the good and bad prefixes, and defined the badPrefixes as "not good" -- this is what I needed as I don't know what to exclude beforehand.

 

Sorry that I set the other earlier reply as solution. This is a good one too!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1029 views
  • 4 likes
  • 3 in conversation