Hi everyone,
If I have a dataset that looks like the following (with ~5,000 rows). The VariableList contains strings with certain prefix ("string*") as well as other numbers and characters, always separated by space.
ID | VariableList |
A1 | stringA stringB 123 |
A2 | stringA + stringB > stringC |
A3 | stringA |
A4 | stringB stringC 456 |
A5 | stringC stringD abc |
And I want to search for units where in the VariableList, the words with prefix include words from a list ("stringA" or "stringB"), but no other prefixed words ("StringC", "StringD", etc). I don't care about the extra strings.
So in this example, I want the new dataset to only contain
ID | VariableList |
A1 | stringA stringB 123 |
A3 | stringA |
I found a similar question (https://communities.sas.com/t5/SAS-Procedures/searching-words-in-a-variable/td-p/85846) and tried the PRX function as follows:
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
%let matchString=stringA|stringB;
data want;
set have;
where prxmatch("/\b(&matchString.)\b/", VariableList);
run;
But this would find all the rows that contain stringA and stringB, including the ones that I don't want (A2 & A4). How do I go about excluding them?
Thanks a lot!
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
data want;
set have;
if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA stringB 123
A2|stringA + stringB > stringC
A3|stringA
A4|stringB stringC 456
A5|stringC stringD abc
; run;
data want;
set have;
if prxmatch('/\bstring[^AB]+\b/i',VariableList) then delete;
run;
Thanks! Wow this is exactly what I was looking for.
Can I ask a follow up question? I forgot to mention that "stringA|stringB" is a dynamic list that I read from a metadata file, and the A and B part are not necessarily single letters, say what I wanted to match was actually "stringA13 & string BC9". I think the brackets [...] only work for characters?
So I wanted something similar to
if prxmatch('/\bstring[^(A13+BC9)]+\b/i',VariableList) then delete;
This seems more complicated than what I thought.
/*
OK. Check this one .
*/
data have;
length ID $ 10 VariableList $ 50;
infile cards dsd dlm='|' truncover ;
input ID VariableList;
datalines;
A1|stringA13 stringBC9 123
A2|stringA13 + stringBC9 > stringC
A3|stringA13
A4|stringBC9 stringC 456
A5|stringC stringD abc
A6|stringBC8 stringD abc
A7|stringBC8 stringA2 abc
A8|stringA13 stringBC9
;
data want;
set have;
if prxmatch('/string(?!A13|BC9)/i',VariableList) then delete;
run;
Try this:
%let goodPrefix=stringA|stringB;
%let badPrefix=stringC|stringD;
/* Or perhaps */
%let goodPrefix=string[AB];
%let badPrefix=string[^AB];
data want;
set have;
if prxmatch("/^(\s*\b(&goodPrefix.))+/io", variableList) and not
prxmatch("/\b(&badPrefix.)/io", variableList);
run;
Thanks!! I like how you listed both the good and bad prefixes, and defined the badPrefixes as "not good" -- this is what I needed as I don't know what to exclude beforehand.
Sorry that I set the other earlier reply as solution. This is a good one too!
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.