BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_student1
Quartz | Level 8

Hello, 

 

Have a question on how to write code a macro or an array for the following situation.

 

I have a dataset with a client ID and 25 Diagnosis codes, so 25 variables.

I would like to write code using if-then where if any of the 25 diagnosis codes meet a criterion it creates a new variable that flags that row as the criteria are met. 

 

For example, I have the following code (note for simplicity I only put 5 diagnosis codes, but in reality, I would be dealing with 25) 

 

 

data have ;               
  input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $ ; 
  datalines ;                                       
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292 
7 F191 Z786 F161 F102 F101
;                                               
run ; 

If any of the codes above have a diagnosis code staring with F or is number only (so the 292, 305 etc.) then I want to create a new column that flags that row as 1 else 0. So 0 would be if none of the codes started with an F or is a number. I the above example row 5 (client 5) would be the only one that would get a 0, the others would get a flag of 1. Even when they have codes that have a Z in them they also have codes that have F or a number. 

 

One way would be to write code for each dx_cd, 

e.g. 

 

if dx_cd1 in :('292','293','F10','30','F19') OR 
dx_cd2 in :('292','293','F10','30','F19') OR 
dx_cd3 in :('292','293','F10','30','F19') OR 
dx_cd4 in :('292','293','F10','30','F19') OR 
dx_cd5 in :('292','293','F10','30','F19') OR  then flag ='1'; else flag='0';

 

But I have 25 of these columns and the diagnosis codes are not just the five iterations I have above, they are way more.

 

Any suggestions on how to code efficiently would be appreciated!

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
unison
Lapis Lazuli | Level 10

Hi @sas_student1.

 

if substr(dxarr,1,3)='F10'

will yield flag=1 if any of the diagnosis codes start with F10. So, yes.

 

If you wanted to add specific codes that aren't picked up in this logic, I'd suggest adding something like this in the if statement:

 

or dxarr in ('code1', 'code2', 'code3')

(replacing code1, code2, and code3, of course). This would just add other cases that would end up being flagged!

 

Cheers,

-unison

View solution in original post

10 REPLIES 10
unison
Lapis Lazuli | Level 10

Hi there. Thank you for your detailed post!

 

Try something like this out:

 

data have;
	input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $;
	datalines;
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292
7 F191 Z786 F161 F102 F101
8 313.1 Z235 Z126 Z786 Z235
;
run;

data want;
	set have;
	flag=0;
	array dxarr dx_cd:;
	do over dxarr;
		if substr(dxarr,1,1)='F' or compress(dxarr,'.','sd')='' then
			do;
				flag=1;
				leave;
			end;
	end;
run;

 

Basically the idea is that you search over your dx_cd vars and if any of them are in your list, you move onto the next observation.

 

The way compress is used here, I remove digits ('d'), spaces ('s') as well as '.' (as @ballardw pointed out, many ICD10 codes contain '.' -- notice that here client 8 receives flag=1). Since these are the only "flaggable" characters, we want to flag only those that come up empty (null) after removal.

-unison
sas_student1
Quartz | Level 8

Thanks @unison this is great! 

And also addressing @ballardw point of considering the dot (.) in the code. Luckily the variables were cleaned and the dot was removed. But the code helps to know!!

 

I do have a follow-up question that I realized I didn't put in my original post.

 

Now the following code

if substr(dxarr,1,1)='F'

 Will grab all the diagnosis that start with F and I am to guess that If I wanted to go even more into the code so that if I wanted to only flag those with F10 and not F17 then I could re-write the code as such: 

 

if substr(dxarr,1,3)='F10'

also, say I wanted a set of certain numeric diagnoses included but not include other then  I could expand on the above codes, right? replace the "=" with an "in" and write in my ranges?  

unison
Lapis Lazuli | Level 10

Hi @sas_student1.

 

if substr(dxarr,1,3)='F10'

will yield flag=1 if any of the diagnosis codes start with F10. So, yes.

 

If you wanted to add specific codes that aren't picked up in this logic, I'd suggest adding something like this in the if statement:

 

or dxarr in ('code1', 'code2', 'code3')

(replacing code1, code2, and code3, of course). This would just add other cases that would end up being flagged!

 

Cheers,

-unison
sas_student1
Quartz | Level 8

Excellent! Thank you!

sas_student1
Quartz | Level 8

Agree, a good solution too! Thank you!

novinosrin
Tourmaline | Level 20

HI @sas_student1  If understand you correctly, it's rather straight forward and simple

 

data have ;               
  input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $ ; 
  datalines ;                                       
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292 
7 F191 Z786 F161 F102 F101
;                                               
run ; 

data want;
set have;
array d dx_cd1-dx_cd5;
length temp $32767;
temp=(cats(of d(*)));
Flag=0;
if notdigit(strip(temp))=0 or index(temp,"F") then flag=1;
drop temp;
run;

 

 

 

 

ballardw
Super User

Are you going to have any of the diagnostics codes with periods in them such as 1.23 (not claiming this as valid but if these are ICD-10 some do have periods).

 

If so does such a value with a period still count "as number only"? If so we'll need a bit more logic than the simple NOTDIGIT function=0 to account for such periods.

hashman
Ammonite | Level 13

@sas_student1:

You need neither a macro nor an array. A single character expression can give you what you want:

data have ;                                                                                                                             
  input client (dx_cd1-dx_cd5) (:$) ;                                                                                                   
  datalines ;                                                                                                                           
1 291  2911 2915 2911 2915                                                                                                              
2 292  2923 2924 2924 2924                                                                                                              
3 F101 F102 F110 303  291                                                                                                               
4 Z345 F181 F191 Z235 Z786                                                                                                              
5 Z345 Z235 Z126 Z786 Z235                                                                                                              
6 305  304  303  303  2915                                                                                                              
7 F191 Z786 F161 F102 F101                                                                                                              
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  flag = ^ findc (compress (tranwrd ("" || catx ("", of dx_cd:), " F", "")), , "kd") ;                                                 
run ;                

Result:

client  dx_cd1  dx_cd2  dx_cd3  dx_cd4  dx_cd5  flag                                                                                    
----------------------------------------------------                                                                                    
   1     291     2911    2915    2911    2915     1                                                                                     
   2     292     2923    2924    2924    2924     1                                                                                     
   3     F101    F102    F110    303     291      1                                                                                     
   4     Z345    F181    F191    Z235    Z786     0                                                                                     
   5     Z345    Z235    Z126    Z786    Z235     0                                                                                     
   6     305     304     303     303     2915     1                                                                                     
   7     F191    Z786    F161    F102    F101     0

Kind regards

Paul D.

 

Astounding
PROC Star

I tend to give answers that utilize what you have coded so far, and extend a bit further.  So ...

 

data want;
set have;
array dx_cd {25};
flag=0;
do k=1 to 25 until (flag=1);
   if dx_cd{k} in :('292','293','F10','30','F19') then flag=1;
end;
drop k;
run;

A few items to note:

 

  • By omitting a list of array elements, the array automatically uses the name of the array (dx_cd) with a numeric suffix (1 thru 25)
  • FLAG is set up as numeric.  Your original code creates it as character because of the quotes.
  • Any method you choose must be careful NOT to set FLAG when a diagnosis code has a missing value.
  • The UNTIL condition improves efficiency by halting the process early.  Once FLAG is 1, there is no need to check additional elements of the array.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1266 views
  • 2 likes
  • 7 in conversation