Solved: Count occurrences between two events ( negative tests occurred between...

sboktor · Posted 08-29-2020 12:08 PM

Greetings,

I have a table that has three variables ID, result, and date of the test. I would like to add a column to the dataset that marks negative results for the unique id that falls between two positive tests as 1 and the rest of the results as 0. Please see the example below:

input dataset:

ID	Result	date
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020

output table should look similar to this table:

ID	Result	date	neg_count
4344	Positive	04/13/2020	0
4344	Positive	06/08/2020	0
4344	Negative	06/16/2020	1
4344	Negative	06/19/2020	1
4344	Positive	08/17/2020	0
5036	Positive	04/24/2020	0
5036	Positive	05/18/2020	0
5036	Positive	05/22/2020	0
5036	Positive	08/24/2020	0
5036	Negative	08/25/2020	0
20594	Negative	04/14/2020	0
20594	Positive	05/16/2020	0
20594	Positive	06/25/2020	0
20594	Positive	07/09/2020	0
20594	Negative	07/28/2020	1
20594	Negative	07/31/2020	1
20594	Negative	08/05/2020	1
20594	Negative	08/11/2020	1
20594	Positive	08/16/2020	0
20594	Negative	08/21/2020	0
20708	Positive	04/15/2020	0
20708	Negative	07/05/2020	1
20708	Positive	07/10/2020	0
20708	Positive	07/21/2020	0
20708	Positive	07/26/2020	0
20708	Negative	08/19/2020	0
25771	Positive	04/07/2020	0
25771	Positive	06/23/2020	0
25771	Negative	07/09/2020	1
25771	Negative	07/10/2020	1
25771	Positive	07/29/2020	0
25771	Negative	08/11/2020	0
25771	Negative	08/17/2020	0
25771	Negative	08/20/2020	0
27359	Positive	04/16/2020	0
27359	Positive	07/30/2020	0
42987	Positive	04/06/2020	0
42987	Positive	04/19/2020	0
42987	Positive	04/20/2020	0
42987	Positive	04/30/2020	0
42987	Negative	05/01/2020	1
42987	Negative	05/11/2020	1
42987	Positive	07/17/2020	0

Thank you.

mkeintz · Posted 08-29-2020 01:49 PM

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 08-29-2020 01:49 PM

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sboktor · Posted 08-29-2020 01:56 PM

Thank you for the prompt response mkeintz. your solution worked nicely.

Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Re: Count occurrences between two events ( negative tests occurred between two positive results)

Registration is open

SAS Training: Just a Click Away