BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chris_LK_87
Quartz | Level 8

I have a dataset with diagnosis codes with one row for each individual. The DiagCode variable can contain multiple diagnosis codes

 

Data Have;

 

Patient_ID$            DiagCode$          
1                              I11 C12 C01

2                             I11

3                             C12 F99 I11

4                             C01

5                          I11 F99

 

I am familiar with the prxmatch-function. See below:

 


%let list=C12 | I11;

data new;
set have;
if prxmatch ("/(&list)/",catx(' ', of DiagCode:));
run;

 

In the statement above each individual with either diagnosis C12 or I11 gets chosen.

 

But what I want is to select individuals only with both diagnosis C12 and I11.

 

Data Want;

Patient_ID$            DiagCode$          
1                              I11 C12 C01

3                             C12 F99 I11

 

 

Is this possible to do with prxmatch or is it possible to do with any other function?

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Chris_LK_87 

 

The issue with "|" is that it means "OR" and not "AND".

Here is an alternative approach:

%let list=C12 I11;

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

If you want to use PRXMATCH, you can also write this:

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Chris_LK_87 

 

The issue with "|" is that it means "OR" and not "AND".

Here is an alternative approach:

%let list=C12 I11;

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

If you want to use PRXMATCH, you can also write this:

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

Best,

Chris_LK_87
Quartz | Level 8
Thank you!
RichardDeVen
Barite | Level 11

There are many ways to check if a string contains two tokens in either order.

 

data have;
input Patient_ID & DiagCode $20.; datalines;
1   I11 C12 C01
2   I11
3   C12 F99 I11
4   C01
5   I11 F99
6   F99 I11
7   C01 C12 I11
;

data want;
  set have;

  * regular expression 'look around' pattern - works if a target token occurs more than once;
  flag_way1 = prxmatch('/^(?=.*\bC12\b)(?=.*\bI11\b).*$/i', DiagCode);

  * AND two patterns for locating tokens - works if a target token occurs more than once;
  flag_way2 = prxmatch('/\bC12\b/i',DiagCode) AND prxmatch('/\bI11\b/i',DiagCode);

  * AND two findws for locating tokens - works if a target token occurs more than once ;
  flag_way3 = findw(DiagCode,'I11',' ','i') AND findw(DiagCode,'C12',' ','i');

  * examine result of transformation - Presume each token only appears once in DiagCode;
  flag_way4 = length(DiagCode) - length(left(tranwrd(tranwrd(DiagCode,'C12',''),'I11',''))) = 8;

  * examine result of transformation - Presume each token only appears once in DiagCode;
  flag_way5 = length(DiagCode) - length(prxchange('s/\s*(C12|I11)\s*//i', -1, DiagCode)) = 8;

  format flag: 4.;
run;
Chris_LK_87
Quartz | Level 8
Thank you for helping me!
PGStats
Opal | Level 21

As mentioned above, "|" means OR. So, for matching the presence of two codes, you can match A followed by B OR B followed by A, as in:

 

prxMatch("/\bC12\b.*\bI11\b|\bI11\b.*\bC12\b/i", DiagCodes)

 

Note \b matches any word boundary, including the beginning or end of a string.

PG
Chris_LK_87
Quartz | Level 8
thanks!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1688 views
  • 0 likes
  • 4 in conversation