BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sboktor
Calcite | Level 5

Greetings,

I have a table that has three variables ID, result, and date of the test. I would like to add a column to the dataset that marks negative results for the unique id that falls between two positive tests as 1 and the rest of the results as 0. Please see the example below:

input dataset:

IDResultdate
4344Positive04/13/2020
4344Positive06/08/2020
4344Negative06/16/2020
4344Negative06/19/2020
4344Positive08/17/2020
5036Positive04/24/2020
5036Positive05/18/2020
5036Positive05/22/2020
5036Positive08/24/2020
5036Negative08/25/2020
20594Negative04/14/2020
20594Positive05/16/2020
20594Positive06/25/2020
20594Positive07/09/2020
20594Negative07/28/2020
20594Negative07/31/2020
20594Negative08/05/2020
20594Negative08/11/2020
20594Positive08/16/2020
20594Negative08/21/2020
20708Positive04/15/2020
20708Negative07/05/2020
20708Positive07/10/2020
20708Positive07/21/2020
20708Positive07/26/2020
20708Negative08/19/2020
25771Positive04/07/2020
25771Positive06/23/2020
25771Negative07/09/2020
25771Negative07/10/2020
25771Positive07/29/2020
25771Negative08/11/2020
25771Negative08/17/2020
25771Negative08/20/2020
27359Positive04/16/2020
27359Positive07/30/2020
42987Positive04/06/2020
42987Positive04/19/2020
42987Positive04/20/2020
42987Positive04/30/2020
42987Negative05/01/2020
42987Negative05/11/2020
42987Positive07/17/2020

 

output table should look similar to this table:

IDResultdateneg_count
4344Positive04/13/20200
4344Positive06/08/20200
4344Negative06/16/20201
4344Negative06/19/20201
4344Positive08/17/20200
5036Positive04/24/20200
5036Positive05/18/20200
5036Positive05/22/20200
5036Positive08/24/20200
5036Negative08/25/20200
20594Negative04/14/20200
20594Positive05/16/20200
20594Positive06/25/20200
20594Positive07/09/20200
20594Negative07/28/20201
20594Negative07/31/20201
20594Negative08/05/20201
20594Negative08/11/20201
20594Positive08/16/20200
20594Negative08/21/20200
20708Positive04/15/20200
20708Negative07/05/20201
20708Positive07/10/20200
20708Positive07/21/20200
20708Positive07/26/20200
20708Negative08/19/20200
25771Positive04/07/20200
25771Positive06/23/20200
25771Negative07/09/20201
25771Negative07/10/20201
25771Positive07/29/20200
25771Negative08/11/20200
25771Negative08/17/20200
25771Negative08/20/20200
27359Positive04/16/20200
27359Positive07/30/20200
42987Positive04/06/20200
42987Positive04/19/20200
42987Positive04/20/20200
42987Positive04/30/20200
42987Negative05/01/20201
42987Negative05/11/20201
42987Positive07/17/20200

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id.  So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original.  Then for any negative result test whether the date-in-hand falls between those two dates:

 

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution.  But note that the first step creates a data step view, not a data step file.  That means the step won't write to disk, and won't activate until the view is called on later.   No data went to disk for re-reading in the second step.  Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

Basically all you need to determine is whether a date with negative results falls between the first and last positive dates for a given id.  So you can make a data set with ID and 2 variables (_first_pos_date and _last_pos_date), then merge it with the original.  Then for any negative result test whether the date-in-hand falls between those two dates:

 

data have;
  input ID	Result:$8.	date :mmddyy10.;
  format date date9. ;
datalines;
4344	Positive	04/13/2020
4344	Positive	06/08/2020
4344	Negative	06/16/2020
4344	Negative	06/19/2020
4344	Positive	08/17/2020
5036	Positive	04/24/2020
5036	Positive	05/18/2020
5036	Positive	05/22/2020
5036	Positive	08/24/2020
5036	Negative	08/25/2020
20594	Negative	04/14/2020
20594	Positive	05/16/2020
20594	Positive	06/25/2020
20594	Positive	07/09/2020
20594	Negative	07/28/2020
20594	Negative	07/31/2020
20594	Negative	08/05/2020
20594	Negative	08/11/2020
20594	Positive	08/16/2020
20594	Negative	08/21/2020
20708	Positive	04/15/2020
20708	Negative	07/05/2020
20708	Positive	07/10/2020
20708	Positive	07/21/2020
20708	Positive	07/26/2020
20708	Negative	08/19/2020
25771	Positive	04/07/2020
25771	Positive	06/23/2020
25771	Negative	07/09/2020
25771	Negative	07/10/2020
25771	Positive	07/29/2020
25771	Negative	08/11/2020
25771	Negative	08/17/2020
25771	Negative	08/20/2020
27359	Positive	04/16/2020
27359	Positive	07/30/2020
42987	Positive	04/06/2020
42987	Positive	04/19/2020
42987	Positive	04/20/2020
42987	Positive	04/30/2020
42987	Negative	05/01/2020
42987	Negative	05/11/2020
42987	Positive	07/17/2020
;

data first_last_pos  (drop=result date) / view=first_last_pos;
  set have (where=(result='Positive'));
  by id;
  retain _first_pos_date;
  if first.id then _first_pos_date=date;
  if last.id;
  _last_pos_date=date;
run;

data want (drop=_:) ;
  merge have first_last_pos;
  by id;
  if result='Negative' and _first_pos_date<date<_last_pos_date then flag=1;
  else flag=0;
run;

This is a two-step solution.  But note that the first step creates a data step view, not a data step file.  That means the step won't write to disk, and won't activate until the view is called on later.   No data went to disk for re-reading in the second step.  Instead it was "piped" via an in-memory process from step 1 to step 2, minimizing extra disk activity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sboktor
Calcite | Level 5

Thank you for the prompt response mkeintz. your solution worked nicely.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 752 views
  • 0 likes
  • 2 in conversation