I have a dataset with diagnosis codes with one row for each individual. The DiagCode variable can contain multiple diagnosis codes
Data Have;
Patient_ID$ DiagCode$
1 I11 C12 C01
2 I11
3 C12 F99 I11
4 C01
5 I11 F99
I am familiar with the prxmatch-function. See below:
%let list=C12 | I11;
data new;
set have;
if prxmatch ("/(&list)/",catx(' ', of DiagCode:));
run;
In the statement above each individual with either diagnosis C12 or I11 gets chosen.
But what I want is to select individuals only with both diagnosis C12 and I11.
Data Want;
Patient_ID$ DiagCode$
1 I11 C12 C01
3 C12 F99 I11
Is this possible to do with prxmatch or is it possible to do with any other function?
Hi @Chris_LK_87
The issue with "|" is that it means "OR" and not "AND".
Here is an alternative approach:
%let list=C12 I11;
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
If you want to use PRXMATCH, you can also write this:
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
Best,
Hi @Chris_LK_87
The issue with "|" is that it means "OR" and not "AND".
Here is an alternative approach:
%let list=C12 I11;
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
If you want to use PRXMATCH, you can also write this:
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
Best,
There are many ways to check if a string contains two tokens in either order.
data have; input Patient_ID & DiagCode $20.; datalines; 1 I11 C12 C01 2 I11 3 C12 F99 I11 4 C01 5 I11 F99 6 F99 I11 7 C01 C12 I11 ; data want; set have; * regular expression 'look around' pattern - works if a target token occurs more than once; flag_way1 = prxmatch('/^(?=.*\bC12\b)(?=.*\bI11\b).*$/i', DiagCode); * AND two patterns for locating tokens - works if a target token occurs more than once; flag_way2 = prxmatch('/\bC12\b/i',DiagCode) AND prxmatch('/\bI11\b/i',DiagCode); * AND two findws for locating tokens - works if a target token occurs more than once ; flag_way3 = findw(DiagCode,'I11',' ','i') AND findw(DiagCode,'C12',' ','i'); * examine result of transformation - Presume each token only appears once in DiagCode; flag_way4 = length(DiagCode) - length(left(tranwrd(tranwrd(DiagCode,'C12',''),'I11',''))) = 8; * examine result of transformation - Presume each token only appears once in DiagCode; flag_way5 = length(DiagCode) - length(prxchange('s/\s*(C12|I11)\s*//i', -1, DiagCode)) = 8; format flag: 4.; run;
As mentioned above, "|" means OR. So, for matching the presence of two codes, you can match A followed by B OR B followed by A, as in:
prxMatch("/\bC12\b.*\bI11\b|\bI11\b.*\bC12\b/i", DiagCodes)
Note \b matches any word boundary, including the beginning or end of a string.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.