I have a quite large dataset with many columns (500+).
Each row contains information about one individual and their diagnosiscode.
I would like to select multiple diagnosiscodes and at the same time exclude certain diagnosiscodes by using an array and put them in a group.
So for example I would like to create a variable for hypertension. So every individual that has one of the following diagnosiscode; 'C02', 'C03', 'C07','C08', 'C09' will get a 1 assigned for the variable hypertension else will get 0. But if they have diagnosiscode 'C02AC02' or 'C07AA07' they
will not get a 1 for hypertension (but if any of their other diagnosiscode is 'C02', 'C03', 'C07','C08', 'C09' they will get a 1).
I have been using an array, see code below, but I haven’t been able to figure out how to select multiple codes while excluding other codes at the same time.
data have;
input
id$ atc1$ atc2$ atc3$;
cards;
1 J01 J01XX05 C07
2 J01XX05 A10A C03C
3 C02 A10A A10B
4 C07AA07 C02 A10B
5 C07AA07 C02AC02 J01
;
run;
This is the code I have been using but I don’t get it to work. I have got it to work if I want to select one code and exclude another code (as the one use for J01_flag).
data want;
set have;
array code(500) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i = 1 to 500 until (Hypertension=1 and J01_flag=1);
if code(i)=: 'C02' or 'C03' or 'C07' or 'C08' or 'C09' and code(i) ne 'C02AC02' or 'C07AA07' then Hypertension=1;
if code(i)=: ‘J01’ and code(i) ne ' J01XX05' then J01_flag=1;
end;
drop i;
run;
data want;
id$ atc1$ atc2$ atc3$ Hypertoni$ J01_flag$;
cards;
1 J01 J01XX05 C07 1 1
2 J01XX05 A10A C03C 1 0
3 C02 A10A A10B 1 0
4 C07AA07 C02 A10B 1 0
5 C07AA07 C02AC02 J01 0 1
;
run;
Hi @Chris_LK_87
Does this code meet your needs?
data want;
set have;
array code(*) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i=1 to dim(code);
if prxmatch('/(C02|C03|C07|C08|C09).*/',code(i)) and code(i) not in ('C02AC02','C07AA07') then Hypertension=1;
if prxmatch('/(J01).*/',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
end;
drop i;
run;
Best,
Hi @Chris_LK_87
Does this code meet your needs?
data want;
set have;
array code(*) $ atc1-atc500;
Hypertension=0;
J01_flag=0;
do i=1 to dim(code);
if prxmatch('/(C02|C03|C07|C08|C09).*/',code(i)) and code(i) not in ('C02AC02','C07AA07') then Hypertension=1;
if prxmatch('/(J01).*/',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
end;
drop i;
run;
Best,
Thanks for replying.
The code selects other diagnosiscodes that I am not looking for, for example everything that contains 'C02' like 'R03AC02'.
Is there a way around this?
This doesn't make any sense.
if code(i)=: 'C02' or 'C03' or 'C07' or 'C08' or 'C09'
You are combining character strings with the boolean operator OR.
Are you looking for the IN operator instead?
if code(i) in: ('C02' 'C03' 'C07' 'C08' 'C09')
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.