I have a quite large dataset with many columns (500+).
Each row contains information about one individual and their diagnosiscode.
I would like to select multiple diagnosiscodes and at the same time exclude certain diagnosiscodes by using an array and put them in a group.
So for example I would like to create a variable for hypertension. So every individual that has one of the following diagnosiscode; 'C02', 'C03', 'C07','C08', 'C09' will get a 1 assigned for the variable hypertension else will get 0. But if they have diagnosiscode 'C02AC02' or 'C07AA07' they
will not get a 1 for hypertension (but if any of their other diagnosiscode is 'C02', 'C03', 'C07','C08', 'C09' they will get a 1).
I have been using an array, see code below, but I haven’t been able to figure out how to select multiple codes while excluding other codes at the same time.
data have;
input
id$ atc1$ atc2$ atc3$;
cards;
1 J01 J01XX05 C07
2 J01XX05 A10A C03C
3 C02 A10A A10B
4 C07AA07 C02 A10B
5 C07AA07 C02AC02 J01
;
run;
This is the code I have been using but I don’t get it to work. I have got it to work if I want to select one code and exclude another code (as the one use for J01_flag).
data want;
set have;
array code(500) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i = 1 to 500 until (Hypertension=1 and J01_flag=1);
if code(i)=: 'C02' or 'C03' or 'C07' or 'C08' or 'C09' and code(i) ne 'C02AC02' or 'C07AA07' then Hypertension=1;
if code(i)=: ‘J01’ and code(i) ne ' J01XX05' then J01_flag=1;
end;
drop i;
run;
data want;
id$ atc1$ atc2$ atc3$ Hypertoni$ J01_flag$;
cards;
1 J01 J01XX05 C07 1 1
2 J01XX05 A10A C03C 1 0
3 C02 A10A A10B 1 0
4 C07AA07 C02 A10B 1 0
5 C07AA07 C02AC02 J01 0 1
;
run;
Hi @Chris_LK_87
Does this code meet your needs?
data want;
set have;
array code(*) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i=1 to dim(code);
if prxmatch('/(C02|C03|C07|C08|C09).*/',code(i)) and code(i) not in ('C02AC02','C07AA07') then Hypertension=1;
if prxmatch('/(J01).*/',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
end;
drop i;
run;
Best,
Hi @Chris_LK_87
Does this code meet your needs?
data want;
set have;
array code(*) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i=1 to dim(code);
if prxmatch('/(C02|C03|C07|C08|C09).*/',code(i)) and code(i) not in ('C02AC02','C07AA07') then Hypertension=1;
if prxmatch('/(J01).*/',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
end;
drop i;
run;
Best,
Thanks for replying.
The code selects other diagnosiscodes that I am not looking for, for example everything that contains 'C02' like 'R03AC02'.
Is there a way around this?
This doesn't make any sense.
if code(i)=: 'C02' or 'C03' or 'C07' or 'C08' or 'C09'
You are combining character strings with the boolean operator OR.
Are you looking for the IN operator instead?
if code(i) in: ('C02' 'C03' 'C07' 'C08' 'C09')
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.