Hello!
I have a dataset with sample types (blood and plasma) taken from patients on various dates.
What I want is: if a patient only had blood sample type taken on any given date, with no plasma sample, then I want to set up a flag, multi_samp=0. On the other hand, if on any given date, patient had both blood and plasma samples taken, then multi_samp=1. If patient only had plasma sample on any given date then multi_samp=1, regardless of whether there was a blood sample taken on that date.
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
run;
data want;
input patid $ sn st $ date: mmddyy. multi_samp;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019 0
001 1002131 blood 08/25/2020 1
001 1002132 blood 08/25/2020 1
001 1002133 plasma 08/25/2020 1
002 2500252 plasma 12/02/2020 1
002 2500253 plasma 04/05/2019 1
002 2500254 blood 04/05/2019 1
002 2500255 blood 04/05/2019 1
002 2500256 blood 09/14/2019 0
003 1224535 blood 01/05/2021 1
003 1224536 blood 01/05/2021 1
003 1224537 plasma 01/05/2021 1
004 6602191 blood 09/08/2020 0
004 6602192 blood 09/08/2020 0
004 6602193 blood 11/02/2020 0
;
run;
What I've done so far is below but this doesn't quite give me what I need and I've been stuck here for the past day:
proc sort data=have out=have2;
by patid st date; run;
data final;
set have2;
by patid st date;
if st ne "blood" then multi_samp=1;
if st="blood" then do;
if first.date and last.date then multi_samp=0;
else multi_samp=1;
end;
run;
proc print data=final; run;
If anybody has any tips or suggestions, I'd greatly appreciate it.
Thanks!
Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data. But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
by patid date;
run;
data want;
do until (last.date);
set have ;
by patid date ;
if st='plasma' then multi_samp=1;
end;
if multi_samp^=1 then multi_samp=0;
do until (last.date);
set have ;
by patid date ;
output;
end;
run;
BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.
Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data. But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
by patid date;
run;
data want;
do until (last.date);
set have ;
by patid date ;
if st='plasma' then multi_samp=1;
end;
if multi_samp^=1 then multi_samp=0;
do until (last.date);
set have ;
by patid date ;
output;
end;
run;
BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.
1. Paste your code using the appropriate icon
2. Why is your flag equal to zero for the first record of ID=1, and also for the last record of ID=2 ?
3. Assuming that's a mistake (please proofread your questions carefully before submitting, that's the least you can do), here's another way:
proc sql;
create table WANT as
select *, ( sum(ST ne 'BLOOD') > 0 ) as FLAG
from HAVE
group by PATID
order by PATID;
Three methods: SQL with remerge, double DO loop, or data step merge with itself:
data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
proc sql;
create table want1 as
select
a.*,
sum(a.st = "plasma") > 0 as flag
from have a
group by patid, date
;
quit;
proc sort data=have;
by patid date;
run;
data want2;
do until (last.date);
set have;
by patid date;
if st = "plasma" then flag = 1;
end;
flag = coalesce(flag,0);
do until (last.date);
set have;
by patid date;
output;
end;
run;
proc sort data=want2;
by patid st date;
run;
data want3;
merge
have
have (
in=f
keep=patid date st
rename=(st=_st)
where=(_st = "plasma")
)
;
by patid date;
flag = f;
drop _st;
run;
proc sort data=want3;
by patid st date;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.