Hi Experts,
I have patient data that contains information about Diseased/NonDiseased. I 'll share the sample dataset and the expected result for your reference.
data have;
input ID Vdate mmddyy10. DSD NDSD;
format Vdate mmddyy10.;
cards;
123 05/05/2000 0 1
123 08/04/2010 0 1
123 06/12/2012 1 0
123 12/02/2015 0 1
789 11/07/1998 1 0
789 10/08/2000 0 1
789 10/08/2000 0 1
951 02/02/2010 0 1
752 08/12/2019 1 0
521 09/09/2005 0 1
521 07/02/2009 1 0
;
I need all obs of IDs that are Nondiseased(NDSD=1) on their first visit date and Diseased(DSD=1 i.e NDSD=0) later on any occurrence.
Expected result: keep all obs of 123, 521 IDs.
Thanks in advance!
data have;
input ID Vdate mmddyy10. DSD NDSD;
format Vdate mmddyy10.;
cards;
123 05/05/2000 0 1
123 08/04/2010 0 1
123 06/12/2012 1 0
123 12/02/2015 0 1
789 11/07/1998 1 0
789 10/08/2000 0 1
789 10/08/2000 0 1
951 02/02/2010 0 1
752 08/12/2019 1 0
521 09/09/2005 0 1
521 07/02/2009 1 0
;
data want (drop=_:);
set have;
by id notsorted;
if first.id=1 then _keep=ifn(ndsd=1,1,0);
else if _keep=1 and (dsd=1 and ndsd=0) then _keep=2;
retain _keep;
if last.id then do until (last.id);
set have;
by id notsorted;
if _keep=2 then output;
end;
run;
This reads each ID group twice. The first pass to determine the _KEEP variables based on your conditions. The second pass to honor the _KEEP variable (outputting only when _KEEP=2).
data have;
input ID Vdate mmddyy10. DSD NDSD;
format Vdate mmddyy10.;
cards;
123 05/05/2000 0 1
123 08/04/2010 0 1
123 06/12/2012 1 0
123 12/02/2015 0 1
789 11/07/1998 1 0
789 10/08/2000 0 1
789 10/08/2000 0 1
951 02/02/2010 0 1
752 08/12/2019 1 0
521 09/09/2005 0 1
521 07/02/2009 1 0
;
data want (drop=_:);
set have;
by id notsorted;
if first.id=1 then _keep=ifn(ndsd=1,1,0);
else if _keep=1 and (dsd=1 and ndsd=0) then _keep=2;
retain _keep;
if last.id then do until (last.id);
set have;
by id notsorted;
if _keep=2 then output;
end;
run;
This reads each ID group twice. The first pass to determine the _KEEP variables based on your conditions. The second pass to honor the _KEEP variable (outputting only when _KEEP=2).
I really appreciate your support. @mkeintz Thank you!
Hi @Sathish_jammy Nice to see the question has been answered. For what it's worth, I would like to share an application of boolean expression in Proc SQL that I learned from @SASKiwi and @PGStats a while ago. Hmm one is apparently burning in the heat down under while the other is freezing in the arctics. 🙂
data have;
input ID Vdate mmddyy10. DSD NDSD;
format Vdate mmddyy10.;
cards;
123 05/05/2000 0 1
123 08/04/2010 0 1
123 06/12/2012 1 0
123 12/02/2015 0 1
789 11/07/1998 1 0
789 10/08/2000 0 1
789 10/08/2000 0 1
951 02/02/2010 0 1
752 08/12/2019 1 0
521 09/09/2005 0 1
521 07/02/2009 1 0
;
proc sql;
create table want as
select *
from have
where ID in (select ID from have group by id having min(vdate)=vdate and ndsd and max(vdate*dsd)>min(vdate))
order by id,vdate;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.