Hello,
Have a question on how to write code a macro or an array for the following situation.
I have a dataset with a client ID and 25 Diagnosis codes, so 25 variables.
I would like to write code using if-then where if any of the 25 diagnosis codes meet a criterion it creates a new variable that flags that row as the criteria are met.
For example, I have the following code (note for simplicity I only put 5 diagnosis codes, but in reality, I would be dealing with 25)
data have ;
input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $ ;
datalines ;
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292
7 F191 Z786 F161 F102 F101
;
run ;
If any of the codes above have a diagnosis code staring with F or is number only (so the 292, 305 etc.) then I want to create a new column that flags that row as 1 else 0. So 0 would be if none of the codes started with an F or is a number. I the above example row 5 (client 5) would be the only one that would get a 0, the others would get a flag of 1. Even when they have codes that have a Z in them they also have codes that have F or a number.
One way would be to write code for each dx_cd,
e.g.
if dx_cd1 in :('292','293','F10','30','F19') OR dx_cd2 in :('292','293','F10','30','F19') OR dx_cd3 in :('292','293','F10','30','F19') OR dx_cd4 in :('292','293','F10','30','F19') OR dx_cd5 in :('292','293','F10','30','F19') OR then flag ='1'; else flag='0';
But I have 25 of these columns and the diagnosis codes are not just the five iterations I have above, they are way more.
Any suggestions on how to code efficiently would be appreciated!
Thank you!
Hi @sas_student1.
if substr(dxarr,1,3)='F10'
will yield flag=1 if any of the diagnosis codes start with F10. So, yes.
If you wanted to add specific codes that aren't picked up in this logic, I'd suggest adding something like this in the if statement:
or dxarr in ('code1', 'code2', 'code3')
(replacing code1, code2, and code3, of course). This would just add other cases that would end up being flagged!
Cheers,
Hi there. Thank you for your detailed post!
Try something like this out:
data have;
input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $;
datalines;
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292
7 F191 Z786 F161 F102 F101
8 313.1 Z235 Z126 Z786 Z235
;
run;
data want;
set have;
flag=0;
array dxarr dx_cd:;
do over dxarr;
if substr(dxarr,1,1)='F' or compress(dxarr,'.','sd')='' then
do;
flag=1;
leave;
end;
end;
run;
Basically the idea is that you search over your dx_cd vars and if any of them are in your list, you move onto the next observation.
The way compress is used here, I remove digits ('d'), spaces ('s') as well as '.' (as @ballardw pointed out, many ICD10 codes contain '.' -- notice that here client 8 receives flag=1). Since these are the only "flaggable" characters, we want to flag only those that come up empty (null) after removal.
Thanks @unison this is great!
And also addressing @ballardw point of considering the dot (.) in the code. Luckily the variables were cleaned and the dot was removed. But the code helps to know!!
I do have a follow-up question that I realized I didn't put in my original post.
Now the following code
if substr(dxarr,1,1)='F'
Will grab all the diagnosis that start with F and I am to guess that If I wanted to go even more into the code so that if I wanted to only flag those with F10 and not F17 then I could re-write the code as such:
if substr(dxarr,1,3)='F10'
also, say I wanted a set of certain numeric diagnoses included but not include other then I could expand on the above codes, right? replace the "=" with an "in" and write in my ranges?
Hi @sas_student1.
if substr(dxarr,1,3)='F10'
will yield flag=1 if any of the diagnosis codes start with F10. So, yes.
If you wanted to add specific codes that aren't picked up in this logic, I'd suggest adding something like this in the if statement:
or dxarr in ('code1', 'code2', 'code3')
(replacing code1, code2, and code3, of course). This would just add other cases that would end up being flagged!
Cheers,
Excellent! Thank you!
Transpose your data to a long format, and use retained flag variables in the following data step.
Agree, a good solution too! Thank you!
HI @sas_student1 If understand you correctly, it's rather straight forward and simple
data have ;
input client dx_cd1 $ dx_cd2 $ dx_cd3 $ dx_cd4 $ dx_cd5 $ ;
datalines ;
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915 292
7 F191 Z786 F161 F102 F101
;
run ;
data want;
set have;
array d dx_cd1-dx_cd5;
length temp $32767;
temp=(cats(of d(*)));
Flag=0;
if notdigit(strip(temp))=0 or index(temp,"F") then flag=1;
drop temp;
run;
Are you going to have any of the diagnostics codes with periods in them such as 1.23 (not claiming this as valid but if these are ICD-10 some do have periods).
If so does such a value with a period still count "as number only"? If so we'll need a bit more logic than the simple NOTDIGIT function=0 to account for such periods.
You need neither a macro nor an array. A single character expression can give you what you want:
data have ;
input client (dx_cd1-dx_cd5) (:$) ;
datalines ;
1 291 2911 2915 2911 2915
2 292 2923 2924 2924 2924
3 F101 F102 F110 303 291
4 Z345 F181 F191 Z235 Z786
5 Z345 Z235 Z126 Z786 Z235
6 305 304 303 303 2915
7 F191 Z786 F161 F102 F101
;
run ;
data want ;
set have ;
flag = ^ findc (compress (tranwrd ("" || catx ("", of dx_cd:), " F", "")), , "kd") ;
run ;
Result:
client dx_cd1 dx_cd2 dx_cd3 dx_cd4 dx_cd5 flag ---------------------------------------------------- 1 291 2911 2915 2911 2915 1 2 292 2923 2924 2924 2924 1 3 F101 F102 F110 303 291 1 4 Z345 F181 F191 Z235 Z786 0 5 Z345 Z235 Z126 Z786 Z235 0 6 305 304 303 303 2915 1 7 F191 Z786 F161 F102 F101 0
Kind regards
Paul D.
I tend to give answers that utilize what you have coded so far, and extend a bit further. So ...
data want;
set have;
array dx_cd {25};
flag=0;
do k=1 to 25 until (flag=1);
if dx_cd{k} in :('292','293','F10','30','F19') then flag=1;
end;
drop k;
run;
A few items to note:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.