Hello!
I have a dataset with sample types (blood and plasma) taken from patients on various dates.
What I want is: if a patient only had blood sample type taken on any given date, with no plasma sample, then I want to set up a flag, multi_samp=0. On the other hand, if on any given date, patient had both blood and plasma samples taken, then multi_samp=1. If patient only had plasma sample on any given date then multi_samp=1, regardless of whether there was a blood sample taken on that date.
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
run;
data want;
input patid $ sn st $ date: mmddyy. multi_samp;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019 0
001 1002131 blood 08/25/2020 1
001 1002132 blood 08/25/2020 1
001 1002133 plasma 08/25/2020 1
002 2500252 plasma 12/02/2020 1
002 2500253 plasma 04/05/2019 1
002 2500254 blood 04/05/2019 1
002 2500255 blood 04/05/2019 1
002 2500256 blood 09/14/2019 0
003 1224535 blood 01/05/2021 1
003 1224536 blood 01/05/2021 1
003 1224537 plasma 01/05/2021 1
004 6602191 blood 09/08/2020 0
004 6602192 blood 09/08/2020 0
004 6602193 blood 11/02/2020 0
;
run;
What I've done so far is below but this doesn't quite give me what I need and I've been stuck here for the past day:
proc sort data=have out=have2;
by patid st date; run;
data final;
set have2;
by patid st date;
if st ne "blood" then multi_samp=1;
if st="blood" then do;
if first.date and last.date then multi_samp=0;
else multi_samp=1;
end;
run;
proc print data=final; run;
If anybody has any tips or suggestions, I'd greatly appreciate it.
Thanks!
Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data. But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
by patid date;
run;
data want;
do until (last.date);
set have ;
by patid date ;
if st='plasma' then multi_samp=1;
end;
if multi_samp^=1 then multi_samp=0;
do until (last.date);
set have ;
by patid date ;
output;
end;
run;
BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.
Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data. But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
by patid date;
run;
data want;
do until (last.date);
set have ;
by patid date ;
if st='plasma' then multi_samp=1;
end;
if multi_samp^=1 then multi_samp=0;
do until (last.date);
set have ;
by patid date ;
output;
end;
run;
BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.
1. Paste your code using the appropriate icon
2. Why is your flag equal to zero for the first record of ID=1, and also for the last record of ID=2 ?
3. Assuming that's a mistake (please proofread your questions carefully before submitting, that's the least you can do), here's another way:
proc sql;
create table WANT as
select *, ( sum(ST ne 'BLOOD') > 0 ) as FLAG
from HAVE
group by PATID
order by PATID;
Three methods: SQL with remerge, double DO loop, or data step merge with itself:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
proc sql;
create table want1 as
select
a.*,
sum(a.st = "plasma") > 0 as flag
from have a
group by patid, date
;
quit;
proc sort data=have;
by patid date;
run;
data want2;
do until (last.date);
set have;
by patid date;
if st = "plasma" then flag = 1;
end;
flag = coalesce(flag,0);
do until (last.date);
set have;
by patid date;
output;
end;
run;
proc sort data=want2;
by patid st date;
run;
data want3;
merge
have
have (
in=f
keep=patid date st
rename=(st=_st)
where=(_st = "plasma")
)
;
by patid date;
flag = f;
drop _st;
run;
proc sort data=want3;
by patid st date;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.