I have a dataset with diagnosis codes with one row for each individual. The DiagCode variable can contain multiple diagnosis codes
Data Have;
Patient_ID$ DiagCode$
1 I11 C12 C01
2 I11
3 C12 F99 I11
4 C01
5 I11 F99
I am familiar with the prxmatch-function. See below:
%let list=C12 | I11;
data new;
set have;
if prxmatch ("/(&list)/",catx(' ', of DiagCode:));
run;
In the statement above each individual with either diagnosis C12 or I11 gets chosen.
But what I want is to select individuals only with both diagnosis C12 and I11.
Data Want;
Patient_ID$ DiagCode$
1 I11 C12 C01
3 C12 F99 I11
Is this possible to do with prxmatch or is it possible to do with any other function?
Hi @Chris_LK_87
The issue with "|" is that it means "OR" and not "AND".
Here is an alternative approach:
%let list=C12 I11;
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
If you want to use PRXMATCH, you can also write this:
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
Best,
Hi @Chris_LK_87
The issue with "|" is that it means "OR" and not "AND".
Here is an alternative approach:
%let list=C12 I11;
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
If you want to use PRXMATCH, you can also write this:
data want;
set have;
flag=0;
do i=1 to countw("&list.");
if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
end;
if flag=countw("&list.") then output;
drop i flag;
run;
Best,
There are many ways to check if a string contains two tokens in either order.
data have; input Patient_ID & DiagCode $20.; datalines; 1 I11 C12 C01 2 I11 3 C12 F99 I11 4 C01 5 I11 F99 6 F99 I11 7 C01 C12 I11 ; data want; set have; * regular expression 'look around' pattern - works if a target token occurs more than once; flag_way1 = prxmatch('/^(?=.*\bC12\b)(?=.*\bI11\b).*$/i', DiagCode); * AND two patterns for locating tokens - works if a target token occurs more than once; flag_way2 = prxmatch('/\bC12\b/i',DiagCode) AND prxmatch('/\bI11\b/i',DiagCode); * AND two findws for locating tokens - works if a target token occurs more than once ; flag_way3 = findw(DiagCode,'I11',' ','i') AND findw(DiagCode,'C12',' ','i'); * examine result of transformation - Presume each token only appears once in DiagCode; flag_way4 = length(DiagCode) - length(left(tranwrd(tranwrd(DiagCode,'C12',''),'I11',''))) = 8; * examine result of transformation - Presume each token only appears once in DiagCode; flag_way5 = length(DiagCode) - length(prxchange('s/\s*(C12|I11)\s*//i', -1, DiagCode)) = 8; format flag: 4.; run;
As mentioned above, "|" means OR. So, for matching the presence of two codes, you can match A followed by B OR B followed by A, as in:
prxMatch("/\bC12\b.*\bI11\b|\bI11\b.*\bC12\b/i", DiagCodes)
Note \b matches any word boundary, including the beginning or end of a string.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.