BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chris_LK_87
Quartz | Level 8

I have a dataset with diagnosis codes with one row for each individual. The DiagCode variable can contain multiple diagnosis codes

 

Data Have;

 

Patient_ID$            DiagCode$          
1                              I11 C12 C01

2                             I11

3                             C12 F99 I11

4                             C01

5                          I11 F99

 

I am familiar with the prxmatch-function. See below:

 


%let list=C12 | I11;

data new;
set have;
if prxmatch ("/(&list)/",catx(' ', of DiagCode:));
run;

 

In the statement above each individual with either diagnosis C12 or I11 gets chosen.

 

But what I want is to select individuals only with both diagnosis C12 and I11.

 

Data Want;

Patient_ID$            DiagCode$          
1                              I11 C12 C01

3                             C12 F99 I11

 

 

Is this possible to do with prxmatch or is it possible to do with any other function?

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Chris_LK_87 

 

The issue with "|" is that it means "OR" and not "AND".

Here is an alternative approach:

%let list=C12 I11;

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

If you want to use PRXMATCH, you can also write this:

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Chris_LK_87 

 

The issue with "|" is that it means "OR" and not "AND".

Here is an alternative approach:

%let list=C12 I11;

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if findw(DiagCode,scan("C12 I11",i," "))>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

If you want to use PRXMATCH, you can also write this:

data want;
	set have;
	flag=0;
	do i=1 to countw("&list.");
		if prxmatch(cats("/",scan("C12 I11",i," "),"/"),DiagCode)>0 then flag+1;
	end;
	if flag=countw("&list.") then output;
	drop i flag;
run;

 

Best,

Chris_LK_87
Quartz | Level 8
Thank you!
RichardDeVen
Barite | Level 11

There are many ways to check if a string contains two tokens in either order.

 

data have;
input Patient_ID & DiagCode $20.; datalines;
1   I11 C12 C01
2   I11
3   C12 F99 I11
4   C01
5   I11 F99
6   F99 I11
7   C01 C12 I11
;

data want;
  set have;

  * regular expression 'look around' pattern - works if a target token occurs more than once;
  flag_way1 = prxmatch('/^(?=.*\bC12\b)(?=.*\bI11\b).*$/i', DiagCode);

  * AND two patterns for locating tokens - works if a target token occurs more than once;
  flag_way2 = prxmatch('/\bC12\b/i',DiagCode) AND prxmatch('/\bI11\b/i',DiagCode);

  * AND two findws for locating tokens - works if a target token occurs more than once ;
  flag_way3 = findw(DiagCode,'I11',' ','i') AND findw(DiagCode,'C12',' ','i');

  * examine result of transformation - Presume each token only appears once in DiagCode;
  flag_way4 = length(DiagCode) - length(left(tranwrd(tranwrd(DiagCode,'C12',''),'I11',''))) = 8;

  * examine result of transformation - Presume each token only appears once in DiagCode;
  flag_way5 = length(DiagCode) - length(prxchange('s/\s*(C12|I11)\s*//i', -1, DiagCode)) = 8;

  format flag: 4.;
run;
Chris_LK_87
Quartz | Level 8
Thank you for helping me!
PGStats
Opal | Level 21

As mentioned above, "|" means OR. So, for matching the presence of two codes, you can match A followed by B OR B followed by A, as in:

 

prxMatch("/\bC12\b.*\bI11\b|\bI11\b.*\bC12\b/i", DiagCodes)

 

Note \b matches any word boundary, including the beginning or end of a string.

PG
Chris_LK_87
Quartz | Level 8
thanks!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1030 views
  • 0 likes
  • 4 in conversation