Greetings,
I have a table that has three variables ID, result, and date of the test. I would like to add a column to the dataset that marks negative results for the unique id that falls between two positive tests as 1 and the rest of the results as 0. Please see the example below:
input dataset:
| ID | Result | date |
| 4344 | Positive | 04/13/2020 |
| 4344 | Positive | 06/08/2020 |
| 4344 | Negative | 06/16/2020 |
| 4344 | Negative | 06/19/2020 |
| 4344 | Positive | 08/17/2020 |
| 5036 | Positive | 04/24/2020 |
| 5036 | Positive | 05/18/2020 |
| 5036 | Positive | 05/22/2020 |
| 5036 | Positive | 08/24/2020 |
| 5036 | Negative | 08/25/2020 |
| 20594 | Negative | 04/14/2020 |
| 20594 | Positive | 05/16/2020 |
| 20594 | Positive | 06/25/2020 |
| 20594 | Positive | 07/09/2020 |
| 20594 | Negative | 07/28/2020 |
| 20594 | Negative | 07/31/2020 |
| 20594 | Negative | 08/05/2020 |
| 20594 | Negative | 08/11/2020 |
| 20594 | Positive | 08/16/2020 |
| 20594 | Negative | 08/21/2020 |
| 20708 | Positive | 04/15/2020 |
| 20708 | Negative | 07/05/2020 |
| 20708 | Positive | 07/10/2020 |
| 20708 | Positive | 07/21/2020 |
| 20708 | Positive | 07/26/2020 |
| 20708 | Negative | 08/19/2020 |
| 25771 | Positive | 04/07/2020 |
| 25771 | Positive | 06/23/2020 |
| 25771 | Negative | 07/09/2020 |
| 25771 | Negative | 07/10/2020 |
| 25771 | Positive | 07/29/2020 |
| 25771 | Negative | 08/11/2020 |
| 25771 | Negative | 08/17/2020 |
| 25771 | Negative | 08/20/2020 |
| 27359 | Positive | 04/16/2020 |
| 27359 | Positive | 07/30/2020 |
| 42987 | Positive | 04/06/2020 |
| 42987 | Positive | 04/19/2020 |
| 42987 | Positive | 04/20/2020 |
| 42987 | Positive | 04/30/2020 |
| 42987 | Negative | 05/01/2020 |
| 42987 | Negative | 05/11/2020 |
| 42987 | Positive | 07/17/2020 |
output table should look similar to this table:
| ID | Result | date | neg_count |
| 4344 | Positive | 04/13/2020 | 0 |
| 4344 | Positive | 06/08/2020 | 0 |
| 4344 | Negative | 06/16/2020 | 1 |
| 4344 | Negative | 06/19/2020 | 1 |
| 4344 | Positive | 08/17/2020 | 0 |
| 5036 | Positive | 04/24/2020 | 0 |
| 5036 | Positive | 05/18/2020 | 0 |
| 5036 | Positive | 05/22/2020 | 0 |
| 5036 | Positive | 08/24/2020 | 0 |
| 5036 | Negative | 08/25/2020 | 0 |
| 20594 | Negative | 04/14/2020 | 0 |
| 20594 | Positive | 05/16/2020 | 0 |
| 20594 | Positive | 06/25/2020 | 0 |
| 20594 | Positive | 07/09/2020 | 0 |
| 20594 | Negative | 07/28/2020 | 1 |
| 20594 | Negative | 07/31/2020 | 1 |
| 20594 | Negative | 08/05/2020 | 1 |
| 20594 | Negative | 08/11/2020 | 1 |
| 20594 | Positive | 08/16/2020 | 0 |
| 20594 | Negative | 08/21/2020 | 0 |
| 20708 | Positive | 04/15/2020 | 0 |
| 20708 | Negative | 07/05/2020 | 1 |
| 20708 | Positive | 07/10/2020 | 0 |
| 20708 | Positive | 07/21/2020 | 0 |
| 20708 | Positive | 07/26/2020 | 0 |
| 20708 | Negative | 08/19/2020 | 0 |
| 25771 | Positive | 04/07/2020 | 0 |
| 25771 | Positive | 06/23/2020 | 0 |
| 25771 | Negative | 07/09/2020 | 1 |
| 25771 | Negative | 07/10/2020 | 1 |
| 25771 | Positive | 07/29/2020 | 0 |
| 25771 | Negative | 08/11/2020 | 0 |
| 25771 | Negative | 08/17/2020 | 0 |
| 25771 | Negative | 08/20/2020 | 0 |
| 27359 | Positive | 04/16/2020 | 0 |
| 27359 | Positive | 07/30/2020 | 0 |
| 42987 | Positive | 04/06/2020 | 0 |
| 42987 | Positive | 04/19/2020 | 0 |
| 42987 | Positive | 04/20/2020 | 0 |
| 42987 | Positive | 04/30/2020 | 0 |
| 42987 | Negative | 05/01/2020 | 1 |
| 42987 | Negative | 05/11/2020 | 1 |
| 42987 | Positive | 07/17/2020 | 0 |
Thank you.
Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:
data have;
input ID Result:$8. date :mmddyy10.;
format date date9. ;
datalines;
4344 Positive 04/13/2020
4344 Positive 06/08/2020
4344 Negative 06/16/2020
4344 Negative 06/19/2020
4344 Positive 08/17/2020
5036 Positive 04/24/2020
5036 Positive 05/18/2020
5036 Positive 05/22/2020
5036 Positive 08/24/2020
5036 Negative 08/25/2020
20594 Negative 04/14/2020
20594 Positive 05/16/2020
20594 Positive 06/25/2020
20594 Positive 07/09/2020
20594 Negative 07/28/2020
20594 Negative 07/31/2020
20594 Negative 08/05/2020
20594 Negative 08/11/2020
20594 Positive 08/16/2020
20594 Negative 08/21/2020
20708 Positive 04/15/2020
20708 Negative 07/05/2020
20708 Positive 07/10/2020
20708 Positive 07/21/2020
20708 Positive 07/26/2020
20708 Negative 08/19/2020
25771 Positive 04/07/2020
25771 Positive 06/23/2020
25771 Negative 07/09/2020
25771 Negative 07/10/2020
25771 Positive 07/29/2020
25771 Negative 08/11/2020
25771 Negative 08/17/2020
25771 Negative 08/20/2020
27359 Positive 04/16/2020
27359 Positive 07/30/2020
42987 Positive 04/06/2020
42987 Positive 04/19/2020
42987 Positive 04/20/2020
42987 Positive 04/30/2020
42987 Negative 05/01/2020
42987 Negative 05/11/2020
42987 Positive 07/17/2020
;
data first_last_pos (drop=result date) / view=first_last_pos;
set have (where=(result='Positive'));
by id;
retain _first_pos_date;
if first.id then _first_pos_date=date;
if last.id;
_last_pos_date=date;
run;
data want (drop=_:) ;
merge have first_last_pos;
by id;
if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
else flag=0;
run;
This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.
Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id. So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original. Then for any negative result test whether the date-in-hand falls between those two dates:
data have;
input ID Result:$8. date :mmddyy10.;
format date date9. ;
datalines;
4344 Positive 04/13/2020
4344 Positive 06/08/2020
4344 Negative 06/16/2020
4344 Negative 06/19/2020
4344 Positive 08/17/2020
5036 Positive 04/24/2020
5036 Positive 05/18/2020
5036 Positive 05/22/2020
5036 Positive 08/24/2020
5036 Negative 08/25/2020
20594 Negative 04/14/2020
20594 Positive 05/16/2020
20594 Positive 06/25/2020
20594 Positive 07/09/2020
20594 Negative 07/28/2020
20594 Negative 07/31/2020
20594 Negative 08/05/2020
20594 Negative 08/11/2020
20594 Positive 08/16/2020
20594 Negative 08/21/2020
20708 Positive 04/15/2020
20708 Negative 07/05/2020
20708 Positive 07/10/2020
20708 Positive 07/21/2020
20708 Positive 07/26/2020
20708 Negative 08/19/2020
25771 Positive 04/07/2020
25771 Positive 06/23/2020
25771 Negative 07/09/2020
25771 Negative 07/10/2020
25771 Positive 07/29/2020
25771 Negative 08/11/2020
25771 Negative 08/17/2020
25771 Negative 08/20/2020
27359 Positive 04/16/2020
27359 Positive 07/30/2020
42987 Positive 04/06/2020
42987 Positive 04/19/2020
42987 Positive 04/20/2020
42987 Positive 04/30/2020
42987 Negative 05/01/2020
42987 Negative 05/11/2020
42987 Positive 07/17/2020
;
data first_last_pos (drop=result date) / view=first_last_pos;
set have (where=(result='Positive'));
by id;
retain _first_pos_date;
if first.id then _first_pos_date=date;
if last.id;
_last_pos_date=date;
run;
data want (drop=_:) ;
merge have first_last_pos;
by id;
if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
else flag=0;
run;
This is a two-step solution. But note that the first step creates a data step view, not a data step file. That means the step won't write to disk, and won't activate until the view is called on later. No data went to disk for re-reading in the second step. Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.
Thank you for the prompt response mkeintz. your solution worked nicely.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.