I have a quite large dataset with many columns (500+).
Each row contains information about one individual and their diagnosiscode.
I would like to select multiple diagnosiscodes and at the same time exclude certain diagnosiscodes by using an array and put them in a group.
So for example I would like to create a variable for hypertension. So every individual that has one of the following diagnosiscode; 'C02', 'C03', 'C07','C08', 'C09' will get a 1 assigned for the variable hypertension else will get 0. But if they have diagnosiscode 'C02AC02' or 'C07AA07' they
will not get a 1 for hypertension (but if any of their other diagnosiscode is 'C02', 'C03', 'C07','C08', 'C09' they will get a 1).
I have been using an array, see code below, but I haven’t been able to figure out how to select multiple codes while excluding other codes at the same time.
data have;
input
id$ atc1$ atc2$ atc3$;
cards;
1 J01 J01XX05 C07
2 J01XX05 A10A C03C
3 C02 A10A A10B
4 C07AA07 C02 A10B
5 C07AA07 C02AC02 J01
;
run;
This is the code I have been using but I don’t get it to work. I have got it to work if I want to select one code and exclude another code (as the one use for J01_flag).
data want;
set have;
array code(500) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i = 1 to 500 until (Hypertension=1 and J01_flag=1);
if code(i)=: 'C02' or 'C03' or 'C07' or 'C08' or 'C09' and code(i) ne 'C02AC02' or 'C07AA07' then Hypertension=1;
if code(i)=: ‘J01’ and code(i) ne ' J01XX05' then J01_flag=1;
end;
drop i;
run;
data want;
id$ atc1$ atc2$ atc3$ Hypertoni$ J01_flag$;
cards;
1 J01 J01XX05 C07 1 1
2 J01XX05 A10A C03C 1 0
3 C02 A10A A10B 1 0
4 C07AA07 C02 A10B 1 0
5 C07AA07 C02AC02 J01 0 1
;
run;
... View more