BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sboktor
Calcite | Level 5

Greetings,

I have a table that has three variables ID, result, and date of the test. I would like to add a column to the dataset that marks negative results for the unique id that falls between two positive tests as 1 and the rest of the results as 0. Please see the example below:

input dataset:

IDResultdate
4344Positive04/13/2020
4344Positive06/08/2020
4344Negative06/16/2020
4344Negative06/19/2020
4344Positive08/17/2020
5036Positive04/24/2020
5036Positive05/18/2020
5036Positive05/22/2020
5036Positive08/24/2020
5036Negative08/25/2020
20594Negative04/14/2020
20594Positive05/16/2020
20594Positive06/25/2020
20594Positive07/09/2020
20594Negative07/28/2020
20594Negative07/31/2020
20594Negative08/05/2020
20594Negative08/11/2020
20594Positive08/16/2020
20594Negative08/21/2020
20708Positive04/15/2020
20708Negative07/05/2020
20708Positive07/10/2020
20708Positive07/21/2020
20708Positive07/26/2020
20708Negative08/19/2020
25771Positive04/07/2020
25771Positive06/23/2020
25771Negative07/09/2020
25771Negative07/10/2020
25771Positive07/29/2020
25771Negative08/11/2020
25771Negative08/17/2020
25771Negative08/20/2020
27359Positive04/16/2020
27359Positive07/30/2020
42987Positive04/06/2020
42987Positive04/19/2020
42987Positive04/20/2020
42987Positive04/30/2020
42987Negative05/01/2020
42987Negative05/11/2020
42987Positive07/17/2020

 

output table should look similar to this table:

IDResultdateneg_count
4344Positive04/13/20200
4344Positive06/08/20200
4344Negative06/16/20201
4344Negative06/19/20201
4344Positive08/17/20200
5036Positive04/24/20200
5036Positive05/18/20200
5036Positive05/22/20200
5036Positive08/24/20200
5036Negative08/25/20200
20594Negative04/14/20200
20594Positive05/16/20200
20594Positive06/25/20200
20594Positive07/09/20200
20594Negative07/28/20201
20594Negative07/31/20201
20594Negative08/05/20201
20594Negative08/11/20201
20594Positive08/16/20200
20594Negative08/21/20200
20708Positive04/15/20200
20708Negative07/05/20201
20708Positive07/10/20200
20708Positive07/21/20200
20708Positive07/26/20200
20708Negative08/19/20200
25771Positive04/07/20200
25771Positive06/23/20200
25771Negative07/09/20201
25771Negative07/10/20201
25771Positive07/29/20200
25771Negative08/11/20200
25771Negative08/17/20200
25771Negative08/20/20200
27359Positive04/16/20200
27359Positive07/30/20200
42987Positive04/06/20200
42987Positive04/19/20200
42987Positive04/20/20200
42987Positive04/30/20200
42987Negative05/01/20201
42987Negative05/11/20201
42987Positive07/17/20200

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id.  So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original.  Then for any negative result test whether the date-in-hand falls between those two dates:

 

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution.  But note that the first step creates a data step view, not a data step file.  That means the step won't write to disk, and won't activate until the view is called on later.   No data went to disk for re-reading in the second step.  Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id.  So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original.  Then for any negative result test whether the date-in-hand falls between those two dates:

 

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution.  But note that the first step creates a data step view, not a data step file.  That means the step won't write to disk, and won't activate until the view is called on later.   No data went to disk for re-reading in the second step.  Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sboktor
Calcite | Level 5

Thank you for the prompt response mkeintz. your solution worked nicely.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 699 views
  • 0 likes
  • 2 in conversation