Dear Experts,
I need a patient count for the condition based on the cancer occurrence. I'll share the sample dataset and the expected result for your reference.
data have;
input ID Date mmddyy10. Inf Cancer;
cards;
123 05/05/2000 1 0
123 08/07/2001 0 1
123 06/07/2002 1 0
159 01/03/2001 1 1
159 02/08/2002 0 1
618 07/07/2005 0 0
618 05/03/2006 1 0
789 06/06/2000 1 0
789 04/02/2001 0 1
789 03/03/2002 1 0
789 03/03/2002 0 0
run;
I required 2 different outputs based on 2 conditions to get the patient IDs.
Kindly suggests a code to get the patient IDs list at the end of the result.
Note: The first occurrence of Cancer is matters but not the following occurrence.
This code gets your intended result, see if it also fits more complicated input data:
data have;
input ID Date mmddyy10. Inf Cancer;
format date MMDDYY10.;
cards;
123 05/05/2000 1 0
123 08/07/2001 0 1
123 06/07/2002 1 0
159 01/03/2001 1 1
159 02/08/2002 0 1
618 07/07/2005 0 0
618 05/03/2006 1 0
789 06/06/2000 1 0
789 04/02/2001 0 1
789 03/03/2002 1 0
789 03/03/2002 0 0
;
proc sql;
create table after as
select distinct a.id
from
have (where=(inf = 1)) a,
(select id, min(date) as date from have (where=(cancer = 1)) group by id) b
where a.id = b.id and a.date >= b.date;
create table before as
select distinct a.id
from
have (where=(inf = 1)) a,
(select id, min(date) as date from have (where=(cancer = 1)) group by id) b
where a.id = b.id and a.date < b.date;
quit;
This code gets your intended result, see if it also fits more complicated input data:
data have;
input ID Date mmddyy10. Inf Cancer;
format date MMDDYY10.;
cards;
123 05/05/2000 1 0
123 08/07/2001 0 1
123 06/07/2002 1 0
159 01/03/2001 1 1
159 02/08/2002 0 1
618 07/07/2005 0 0
618 05/03/2006 1 0
789 06/06/2000 1 0
789 04/02/2001 0 1
789 03/03/2002 1 0
789 03/03/2002 0 0
;
proc sql;
create table after as
select distinct a.id
from
have (where=(inf = 1)) a,
(select id, min(date) as date from have (where=(cancer = 1)) group by id) b
where a.id = b.id and a.date >= b.date;
create table before as
select distinct a.id
from
have (where=(inf = 1)) a,
(select id, min(date) as date from have (where=(cancer = 1)) group by id) b
where a.id = b.id and a.date < b.date;
quit;
data have;
input ID Date mmddyy10. Inf Cancer;
format date mmddyy10.;
cards;
123 05/05/2000 1 0
123 08/07/2001 0 1
123 06/07/2002 1 0
159 01/03/2001 1 1
159 02/08/2002 0 1
618 07/07/2005 0 0
618 05/03/2006 1 0
789 06/06/2000 1 0
789 04/02/2001 0 1
789 03/03/2002 1 0
789 03/03/2002 0 0
;
proc sql;
/*Report 1 On/After*/
create table report1 as
select *
from have
group by id
having max(cancer=1);
/*Report 2 Before*/
create table report2 as
select *
from have
group by id
having min(ifn(inf=1,date,.)) < min(ifn(cancer=1,date,.)) ;
quit;
Since it appears your data is sorted by ID Date, Datastep seems a lot easier IMHO
data have;
input ID Date mmddyy10. Inf Cancer;
format date mmddyy10.;
cards;
123 05/05/2000 1 0
123 08/07/2001 0 1
123 06/07/2002 1 0
159 01/03/2001 1 1
159 02/08/2002 0 1
618 07/07/2005 0 0
618 05/03/2006 1 0
789 06/06/2000 1 0
789 04/02/2001 0 1
789 03/03/2002 1 0
789 03/03/2002 0 0
;
data report1 report2;
do _n_=1 by 1 until(last.id);
set have;
by id;
if nmiss(_d,_d1)=0 then continue;
if not _d1 and Cancer=1 then _d1=date;
if not _d and inf=1 then _d=date;
end;
do _n_=1 to _n_;
set have;
if _d1 then output report1;
if _d<_d1 then output report2;
end;
drop _:;
run;
Thank you for always stepping in to help when I need you most. @novinosrin @Kurt_Bremser it really meant a lot.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.