BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

Hello!

 

I have a dataset with  sample types (blood and plasma) taken from patients on various dates.

 

What I want is: if a patient only had blood sample type taken on any given date, with no plasma sample, then I want to set up a flag, multi_samp=0. On the other hand, if on any given date, patient had both blood and plasma samples taken, then multi_samp=1. If patient only had plasma sample on any given date then multi_samp=1, regardless of whether there was a blood sample taken on that date.

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
run;

 

data want;
input patid $ sn st $ date: mmddyy. multi_samp;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019 0
001 1002131 blood 08/25/2020 1
001 1002132 blood 08/25/2020 1
001 1002133 plasma 08/25/2020 1
002 2500252 plasma 12/02/2020 1
002 2500253 plasma 04/05/2019 1
002 2500254 blood 04/05/2019 1
002 2500255 blood 04/05/2019 1
002 2500256 blood 09/14/2019 0
003 1224535 blood 01/05/2021 1
003 1224536 blood 01/05/2021 1
003 1224537 plasma 01/05/2021 1
004 6602191 blood 09/08/2020 0
004 6602192 blood 09/08/2020 0
004 6602193 blood 11/02/2020 0
;
run;

 

What I've done so far is below but this doesn't quite give me what I need and I've been stuck here for the past day:

 

proc sort data=have out=have2;
by patid st date; run;
data final;
set have2;
by patid st date;
if st ne "blood" then multi_samp=1;

if st="blood" then do;
if first.date and last.date then multi_samp=0;
else multi_samp=1;
end;

run;
proc print data=final; run;

 

If anybody has any tips or suggestions, I'd greatly appreciate it.

 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data.   But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
  by patid date;
run;
data want;
  do until (last.date);
    set have ;
    by patid date ;
    if st='plasma' then multi_samp=1;
  end;
  if multi_samp^=1 then multi_samp=0;
  do until (last.date);
    set have ;
    by patid date ;
    output;
  end;
run;

 

BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data.   But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
  by patid date;
run;
data want;
  do until (last.date);
    set have ;
    by patid date ;
    if st='plasma' then multi_samp=1;
  end;
  if multi_samp^=1 then multi_samp=0;
  do until (last.date);
    set have ;
    by patid date ;
    output;
  end;
run;

 

BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Merdock
Quartz | Level 8
Thank you very much, mkeintz, this works perfectly. Apologies for any confusion created with the use of the name "multi", I agree that it probably wasn't the best term for this purpose.
ChrisNZ
Tourmaline | Level 20

1. Paste your code using the appropriate icon 

2. Why is your flag equal to zero for the first record of ID=1, and also for the last record of ID=2 ?

3. Assuming that's a mistake (please proofread your questions carefully before submitting, that's the least you can do), here's another way:

proc sql; 
  create table WANT as 
  select *, ( sum(ST ne 'BLOOD') > 0 ) as FLAG
  from HAVE
  group by PATID
  order by PATID;

 

 

Merdock
Quartz | Level 8
Thanks, ChrisNZ.
1. I will use the appropriate icon to paste my code next time, thank you for pointing this out.
2. The flag is 0 for first record of ID=1 and for the last record of ID=2 because there are no plasma samples taken on those dates. And as I've clearly specified, the flag will be 0 for blood samples collected on any given day if there's no plasma sample collected on same day.
3. It's not a mistake, see #2 above.
Kurt_Bremser
Super User

Three methods: SQL with remerge, double DO loop, or data step merge with itself:

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;

proc sql;
create table want1 as
  select
    a.*,
    sum(a.st = "plasma") > 0 as flag
  from have a
  group by patid, date
;
quit;

proc sort data=have;
by patid date;
run;

data want2;
do until (last.date);
  set have;
  by patid date;
  if st = "plasma" then flag = 1;
end;
flag = coalesce(flag,0);
do until (last.date);
  set have;
  by patid date;
  output;
end;
run;
  
proc sort data=want2;
by patid st date;
run;

data want3;
merge
  have
  have (
    in=f
    keep=patid date st
    rename=(st=_st)
    where=(_st = "plasma")
  )
;
by patid date;
flag = f;
drop _st;
run;

proc sort data=want3;
by patid st date;
run;
Merdock
Quartz | Level 8
Thank you, KurtBremser!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1501 views
  • 3 likes
  • 4 in conversation