BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nat4
Fluorite | Level 6

Hi everyone,

 

If I have a dataset that looks like the following (with ~5,000 rows). The VariableList contains strings with certain prefix ("string*") as well as other numbers and characters, always separated by space.

     

IDVariableList
A1stringA stringB 123
A2stringA + stringB > stringC
A3stringA 
A4stringB stringC 456
A5stringC stringD abc

 

And I want to search for units where in the VariableList, the words with prefix include words from a list ("stringA" or "stringB"), but no other prefixed words ("StringC", "StringD", etc). I don't care about the extra strings.

So in this example, I want the new dataset to only contain 

   

IDVariableList
A1stringA stringB 123
A3stringA 

 

I found a similar question (https://communities.sas.com/t5/SAS-Procedures/searching-words-in-a-variable/td-p/85846) and tried the PRX function as follows:

 

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

 

%let matchString=stringA|stringB;


data want;
   set have;
   where prxmatch("/\b(&matchString.)\b/", VariableList);
run;

 

But this would find all the rows that contain stringA and stringB, including the ones that I don't want (A2 & A4). How do I go about excluding them?

Thanks a lot!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;

View solution in original post

5 REPLIES 5
Ksharp
Super User
data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA 
A4|stringB stringC 456
A5|stringC stringD abc
; run;

data want;
 set have;
 if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
Nat4
Fluorite | Level 6

Thanks! Wow this is exactly what I was looking for.

 

Can I ask a follow up question? I forgot to mention that "stringA|stringB" is a dynamic list that I read from a metadata file, and the A and B part are not necessarily single letters, say what I wanted to match was actually "stringA13 & string BC9". I think the brackets [...] only work for characters?

 

So I wanted something similar to

 

if prxmatch('/\bstring[^(A13+BC9)]+\b/i',VariableList) then delete;

This seems more complicated than what I thought.

Ksharp
Super User
/*
OK. Check this one .
*/

data have;
   length ID $ 10 VariableList $ 50;
   infile cards dsd dlm='|' truncover ;
   input ID VariableList;
   datalines;
A1|stringA13 stringBC9 123
A2|stringA13 + stringBC9 > stringC
A3|stringA13 
A4|stringBC9 stringC 456
A5|stringC stringD abc
A6|stringBC8 stringD abc
A7|stringBC8 stringA2 abc
A8|stringA13 stringBC9
;

data want;
 set have; 
 if prxmatch('/string(?!A13|BC9)/i',VariableList) then delete;
run;
PGStats
Opal | Level 21

Try this:

 

%let goodPrefix=stringA|stringB;
%let badPrefix=stringC|stringD;

/* Or perhaps */

%let goodPrefix=string[AB];
%let badPrefix=string[^AB];

data want;
set have;
if prxmatch("/^(\s*\b(&goodPrefix.))+/io", variableList) and not
    prxmatch("/\b(&badPrefix.)/io", variableList);
run;
PG
Nat4
Fluorite | Level 6

Thanks!! I like how you listed both the good and bad prefixes, and defined the badPrefixes as "not good" -- this is what I needed as I don't know what to exclude beforehand.

 

Sorry that I set the other earlier reply as solution. This is a good one too!

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 372 views
  • 4 likes
  • 3 in conversation