BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

Hello!

 

I have a dataset with  sample types (blood and plasma) taken from patients on various dates.

 

What I want is: if a patient only had blood sample type taken on any given date, with no plasma sample, then I want to set up a flag, multi_samp=0. On the other hand, if on any given date, patient had both blood and plasma samples taken, then multi_samp=1. If patient only had plasma sample on any given date then multi_samp=1, regardless of whether there was a blood sample taken on that date.

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;
run;

 

data want;
input patid $ sn st $ date: mmddyy. multi_samp;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019 0
001 1002131 blood 08/25/2020 1
001 1002132 blood 08/25/2020 1
001 1002133 plasma 08/25/2020 1
002 2500252 plasma 12/02/2020 1
002 2500253 plasma 04/05/2019 1
002 2500254 blood 04/05/2019 1
002 2500255 blood 04/05/2019 1
002 2500256 blood 09/14/2019 0
003 1224535 blood 01/05/2021 1
003 1224536 blood 01/05/2021 1
003 1224537 plasma 01/05/2021 1
004 6602191 blood 09/08/2020 0
004 6602192 blood 09/08/2020 0
004 6602193 blood 11/02/2020 0
;
run;

 

What I've done so far is below but this doesn't quite give me what I need and I've been stuck here for the past day:

 

proc sort data=have out=have2;
by patid st date; run;
data final;
set have2;
by patid st date;
if st ne "blood" then multi_samp=1;

if st="blood" then do;
if first.date and last.date then multi_samp=0;
else multi_samp=1;
end;

run;
proc print data=final; run;

 

If anybody has any tips or suggestions, I'd greatly appreciate it.

 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data.   But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
  by patid date;
run;
data want;
  do until (last.date);
    set have ;
    by patid date ;
    if st='plasma' then multi_samp=1;
  end;
  if multi_samp^=1 then multi_samp=0;
  do until (last.date);
    set have ;
    by patid date ;
    output;
  end;
run;

 

BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Your data appears to already be grouped by patid/date, even though it's not in ascending or descending order, so you don't have to sort the data.   But you do need to read each patid/date group twice - once to find any "plasma" records, and once to output the data with the muti_samp dummy variable:

 

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
run;
proc sort;
  by patid date;
run;
data want;
  do until (last.date);
    set have ;
    by patid date ;
    if st='plasma' then multi_samp=1;
  end;
  if multi_samp^=1 then multi_samp=0;
  do until (last.date);
    set have ;
    by patid date ;
    output;
  end;
run;

 

BTW, multi_samp is a bit misleading term, since there is nothing "multi" in having only a plasma sample.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Merdock
Quartz | Level 8
Thank you very much, mkeintz, this works perfectly. Apologies for any confusion created with the use of the name "multi", I agree that it probably wasn't the best term for this purpose.
ChrisNZ
Tourmaline | Level 20

1. Paste your code using the appropriate icon 

2. Why is your flag equal to zero for the first record of ID=1, and also for the last record of ID=2 ?

3. Assuming that's a mistake (please proofread your questions carefully before submitting, that's the least you can do), here's another way:

proc sql; 
  create table WANT as 
  select *, ( sum(ST ne 'BLOOD') > 0 ) as FLAG
  from HAVE
  group by PATID
  order by PATID;

 

 

Merdock
Quartz | Level 8
Thanks, ChrisNZ.
1. I will use the appropriate icon to paste my code next time, thank you for pointing this out.
2. The flag is 0 for first record of ID=1 and for the last record of ID=2 because there are no plasma samples taken on those dates. And as I've clearly specified, the flag will be 0 for blood samples collected on any given day if there's no plasma sample collected on same day.
3. It's not a mistake, see #2 above.
Kurt_Bremser
Super User

Three methods: SQL with remerge, double DO loop, or data step merge with itself:

data have;
input patid $ sn st $ date: mmddyy.;
format date mmddyy.;
datalines;
001 1002130 blood 07/01/2019
001 1002131 blood 08/25/2020
001 1002132 blood 08/25/2020
001 1002133 plasma 08/25/2020
002 2500252 plasma 12/02/2020
002 2500253 plasma 04/05/2019
002 2500254 blood 04/05/2019
002 2500255 blood 04/05/2019
002 2500256 blood 09/14/2019
003 1224535 blood 01/05/2021
003 1224536 blood 01/05/2021
003 1224537 plasma 01/05/2021
004 6602191 blood 09/08/2020
004 6602192 blood 09/08/2020
004 6602193 blood 11/02/2020
;

proc sql;
create table want1 as
  select
    a.*,
    sum(a.st = "plasma") > 0 as flag
  from have a
  group by patid, date
;
quit;

proc sort data=have;
by patid date;
run;

data want2;
do until (last.date);
  set have;
  by patid date;
  if st = "plasma" then flag = 1;
end;
flag = coalesce(flag,0);
do until (last.date);
  set have;
  by patid date;
  output;
end;
run;
  
proc sort data=want2;
by patid st date;
run;

data want3;
merge
  have
  have (
    in=f
    keep=patid date st
    rename=(st=_st)
    where=(_st = "plasma")
  )
;
by patid date;
flag = f;
drop _st;
run;

proc sort data=want3;
by patid st date;
run;
Merdock
Quartz | Level 8
Thank you, KurtBremser!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1693 views
  • 3 likes
  • 4 in conversation