If I have a dataset like this: One patient can have multiple diagnosis codes, but it is formatted in a way that only one diagnosis code for one row.
Patient_ID | Diagnosis_code |
1 | 1 |
1 | 2 |
1 | 4 |
2 | 2 |
2 | 3 |
2 | 4 |
3 | 5 |
3 | 6 |
3 | 7 |
If I want to define a new variable called disease so that if a person's diagnosis codes contain either 2 or 3, then disease=1 otherwise disease=0. The result would be like this:
Patient_ID | Disease |
1 | 1 |
2 | 1 |
3 | 0 |
. How should I code it? Thank you.
Thanks @r_behata for posting the data in usable form.
@Boon_Noob Simple by-group processing does the job:
data want;
set have;
by Patient_ID;
retain Disease;
if first.Patient_ID then do;
Disease = 0;
end;
Disease = Disease or (Diagnosis_Code in ('2', '3'));
if last.Patient_ID then do;
output;
end;
drop Diagnosis_Code;
run;
data have;
input Patient_ID $ Diagnosis_code $;
infile cards delimiter='09'x;
cards;
1 1
1 2
1 4
2 2
2 3
2 4
3 5
3 6
3 7
run;
proc sql;
create table want
as
select distinct a.Patient_ID ,
case when b.Patient_ID ne ''
then 1
else 0 end as Disease
from have a
left join (select distinct
Patient_ID from have
where Diagnosis_code in ('2','3')
) b
on a.Patient_ID=b.Patient_ID;
quit;
Thanks @r_behata for posting the data in usable form.
@Boon_Noob Simple by-group processing does the job:
data want;
set have;
by Patient_ID;
retain Disease;
if first.Patient_ID then do;
Disease = 0;
end;
Disease = Disease or (Diagnosis_Code in ('2', '3'));
if last.Patient_ID then do;
output;
end;
drop Diagnosis_Code;
run;
A different method of BY processing, using a DO loop to work through a group:
data want;
disease = 0;
do until (last.patient_id);
set have;
by patient_id;
if diagnosis_code in ("2","3") then disease = 1;
end;
keep patient_id disease;
run;
data have;
input Patient_ID Diagnosis_code;
cards;
1 1
1 2
1 4
2 2
2 3
2 4
3 5
3 6
3 7
;
proc sql;
create table want as
select Patient_ID,max(Diagnosis_code in (2 3)) as Disease
from have
group by Patient_ID;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.