Greetings,
I have a table that has three variables ID, result, and date of the test. I would like to add a column to the dataset that marks negative results for the unique id that falls between two positive tests as 1 and the rest of the results as 0. Please see the example below:
input dataset:
ID | Result | date |
4344 | Positive | 04/13/2020 |
4344 | Positive | 06/08/2020 |
4344 | Negative | 06/16/2020 |
4344 | Negative | 06/19/2020 |
4344 | Positive | 08/17/2020 |
5036 | Positive | 04/24/2020 |
5036 | Positive | 05/18/2020 |
5036 | Positive | 05/22/2020 |
5036 | Positive | 08/24/2020 |
5036 | Negative | 08/25/2020 |
20594 | Negative | 04/14/2020 |
20594 | Positive | 05/16/2020 |
20594 | Positive | 06/25/2020 |
20594 | Positive | 07/09/2020 |
20594 | Negative | 07/28/2020 |
20594 | Negative | 07/31/2020 |
20594 | Negative | 08/05/2020 |
20594 | Negative | 08/11/2020 |
20594 | Positive | 08/16/2020 |
20594 | Negative | 08/21/2020 |
20708 | Positive | 04/15/2020 |
20708 | Negative | 07/05/2020 |
20708 | Positive | 07/10/2020 |
20708 | Positive | 07/21/2020 |
20708 | Positive | 07/26/2020 |
20708 | Negative | 08/19/2020 |
25771 | Positive | 04/07/2020 |
25771 | Positive | 06/23/2020 |
25771 | Negative | 07/09/2020 |
25771 | Negative | 07/10/2020 |
25771 | Positive | 07/29/2020 |
25771 | Negative | 08/11/2020 |
25771 | Negative | 08/17/2020 |
25771 | Negative | 08/20/2020 |
27359 | Positive | 04/16/2020 |
27359 | Positive | 07/30/2020 |
42987 | Positive | 04/06/2020 |
42987 | Positive | 04/19/2020 |
42987 | Positive | 04/20/2020 |
42987 | Positive | 04/30/2020 |
42987 | Negative | 05/01/2020 |
42987 | Negative | 05/11/2020 |
42987 | Positive | 07/17/2020 |
output table should look similar to this table:
ID | Result | date | neg_count |
4344 | Positive | 04/13/2020 | 0 |
4344 | Positive | 06/08/2020 | 0 |
4344 | Negative | 06/16/2020 | 1 |
4344 | Negative | 06/19/2020 | 1 |
4344 | Positive | 08/17/2020 | 0 |
5036 | Positive | 04/24/2020 | 0 |
5036 | Positive | 05/18/2020 | 0 |
5036 | Positive | 05/22/2020 | 0 |
5036 | Positive | 08/24/2020 | 0 |
5036 | Negative | 08/25/2020 | 0 |
20594 | Negative | 04/14/2020 | 0 |
20594 | Positive | 05/16/2020 | 0 |
20594 | Positive | 06/25/2020 | 0 |
20594 | Positive | 07/09/2020 | 0 |
20594 | Negative | 07/28/2020 | 1 |
20594 | Negative | 07/31/2020 | 1 |
20594 | Negative | 08/05/2020 | 1 |
20594 | Negative | 08/11/2020 | 1 |
20594 | Positive | 08/16/2020 | 0 |
20594 | Negative | 08/21/2020 | 0 |
20708 | Positive | 04/15/2020 | 0 |
20708 | Negative | 07/05/2020 | 1 |
20708 | Positive | 07/10/2020 | 0 |
20708 | Positive | 07/21/2020 | 0 |
20708 | Positive | 07/26/2020 | 0 |
20708 | Negative | 08/19/2020 | 0 |
25771 | Positive | 04/07/2020 | 0 |
25771 | Positive | 06/23/2020 | 0 |
25771 | Negative | 07/09/2020 | 1 |
25771 | Negative | 07/10/2020 | 1 |
25771 | Positive | 07/29/2020 | 0 |
25771 | Negative | 08/11/2020 | 0 |
25771 | Negative | 08/17/2020 | 0 |
25771 | Negative | 08/20/2020 | 0 |
27359 | Positive | 04/16/2020 | 0 |
27359 | Positive | 07/30/2020 | 0 |
42987 | Positive | 04/06/2020 | 0 |
42987 | Positive | 04/19/2020 | 0 |
42987 | Positive | 04/20/2020 | 0 |
42987 | Positive | 04/30/2020 | 0 |
42987 | Negative | 05/01/2020 | 1 |
42987 | Negative | 05/11/2020 | 1 |
42987 | Positive | 07/17/2020 | 0 |
Thank you.
Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:
data have;
input ID Result:$8. date :mmddyy10.;
format date date9. ;
datalines;
4344 Positive 04/13/2020
4344 Positive 06/08/2020
4344 Negative 06/16/2020
4344 Negative 06/19/2020
4344 Positive 08/17/2020
5036 Positive 04/24/2020
5036 Positive 05/18/2020
5036 Positive 05/22/2020
5036 Positive 08/24/2020
5036 Negative 08/25/2020
20594 Negative 04/14/2020
20594 Positive 05/16/2020
20594 Positive 06/25/2020
20594 Positive 07/09/2020
20594 Negative 07/28/2020
20594 Negative 07/31/2020
20594 Negative 08/05/2020
20594 Negative 08/11/2020
20594 Positive 08/16/2020
20594 Negative 08/21/2020
20708 Positive 04/15/2020
20708 Negative 07/05/2020
20708 Positive 07/10/2020
20708 Positive 07/21/2020
20708 Positive 07/26/2020
20708 Negative 08/19/2020
25771 Positive 04/07/2020
25771 Positive 06/23/2020
25771 Negative 07/09/2020
25771 Negative 07/10/2020
25771 Positive 07/29/2020
25771 Negative 08/11/2020
25771 Negative 08/17/2020
25771 Negative 08/20/2020
27359 Positive 04/16/2020
27359 Positive 07/30/2020
42987 Positive 04/06/2020
42987 Positive 04/19/2020
42987 Positive 04/20/2020
42987 Positive 04/30/2020
42987 Negative 05/01/2020
42987 Negative 05/11/2020
42987 Positive 07/17/2020
;
data first_last_pos (drop=result date) / view=first_last_pos;
set have (where=(result='Positive'));
by id;
retain _first_pos_date;
if first.id then _first_pos_date=date;
if last.id;
_last_pos_date=date;
run;
data want (drop=_:) ;
merge have first_last_pos;
by id;
if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
else flag=0;
run;
This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.
Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:
data have;
input ID Result:$8. date :mmddyy10.;
format date date9. ;
datalines;
4344 Positive 04/13/2020
4344 Positive 06/08/2020
4344 Negative 06/16/2020
4344 Negative 06/19/2020
4344 Positive 08/17/2020
5036 Positive 04/24/2020
5036 Positive 05/18/2020
5036 Positive 05/22/2020
5036 Positive 08/24/2020
5036 Negative 08/25/2020
20594 Negative 04/14/2020
20594 Positive 05/16/2020
20594 Positive 06/25/2020
20594 Positive 07/09/2020
20594 Negative 07/28/2020
20594 Negative 07/31/2020
20594 Negative 08/05/2020
20594 Negative 08/11/2020
20594 Positive 08/16/2020
20594 Negative 08/21/2020
20708 Positive 04/15/2020
20708 Negative 07/05/2020
20708 Positive 07/10/2020
20708 Positive 07/21/2020
20708 Positive 07/26/2020
20708 Negative 08/19/2020
25771 Positive 04/07/2020
25771 Positive 06/23/2020
25771 Negative 07/09/2020
25771 Negative 07/10/2020
25771 Positive 07/29/2020
25771 Negative 08/11/2020
25771 Negative 08/17/2020
25771 Negative 08/20/2020
27359 Positive 04/16/2020
27359 Positive 07/30/2020
42987 Positive 04/06/2020
42987 Positive 04/19/2020
42987 Positive 04/20/2020
42987 Positive 04/30/2020
42987 Negative 05/01/2020
42987 Negative 05/11/2020
42987 Positive 07/17/2020
;
data first_last_pos (drop=result date) / view=first_last_pos;
set have (where=(result='Positive'));
by id;
retain _first_pos_date;
if first.id then _first_pos_date=date;
if last.id;
_last_pos_date=date;
run;
data want (drop=_:) ;
merge have first_last_pos;
by id;
if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
else flag=0;
run;
This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.
Thank you for the prompt response mkeintz. your solution worked nicely.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.