I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. DAYSBWN1 is the length of time of each period (EndDate1 – StartDate in each row). Sample code below:
data tempfile;
infile datalines truncover;
INPUT
StudyID$2.@+1 StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1 daysbwn1 ;
format Startdate EndDate1 mmddyy10.;
datalines;
01 02/01/2007 05/01/2008 455
02 06/02/2012 05/24/2013 356
03 05/07/2010 08/19/2010 104
04 06/16/2007 06/21/2008 371
05 04/28/2007 11/07/2008 559
06 03/16/2007 06/20/2007 96
07 04/18/2007 04/19/2007 1
08 05/11/2007 08/03/2007 84
09 08/01/2007 08/28/2007 27
09 02/21/2008 03/05/2008 13
10 02/06/2008 03/12/2010 765
11 01/31/2007 12/26/2008 695
11 02/11/2009 05/08/2009 86
11 05/19/2009 08/14/2009 87
11 09/22/2009 12/18/2009 87
11 01/09/2010 02/03/2010 25
11 02/23/2010 11/19/2010 269
11 12/01/2010 03/18/2011 107
11 05/10/2011 04/23/2013 714
11 08/03/2013 09/27/2013 55
;
run;
proc print data=tempfile; run;
I would like to write a code that creates two new variables:
DAYSBWN_CUM: The total number of time for all follow-up periods aka repeat rows (DAYSBWN1 added up for any rows that follow).
REPEAT_NUM: The total number of times the subject has a repeat row (REPEAT_NUM is > 1 only if the subject has more than one row).
The goal would be to get something that looks like this:
StudyID | StartDate | EndDate1 | daysbwn1 | DAYSBWN_CUM | REPEAT_NUM |
01 | 02/01/2007 | 05/01/2008 | 455 | 0 | 0 |
02 | 06/02/2012 | 05/24/2013 | 356 | 0 | 0 |
03 | 05/07/2010 | 08/19/2010 | 104 | 0 | 0 |
04 | 06/16/2007 | 06/21/2008 | 371 | 0 | 0 |
05 | 04/28/2007 | 11/07/2008 | 559 | 0 | 0 |
06 | 03/16/2007 | 06/20/2007 | 96 | 0 | 0 |
07 | 04/18/2007 | 04/19/2007 | 1 | 0 | 0 |
08 | 05/11/2007 | 08/03/2007 | 84 | 0 | 0 |
09 | 08/01/2007 | 08/28/2007 | 27 | 13 | 1 |
09 | 02/21/2008 | 03/05/2008 | 13 | 0 | 0 |
10 | 02/06/2008 | 03/12/2010 | 765 | 0 | 0 |
11 | 01/31/2007 | 12/26/2008 | 695 | 1430 | 8 |
11 | 02/11/2009 | 05/08/2009 | 86 | 1344 | 7 |
11 | 05/19/2009 | 08/14/2009 | 87 | 1257 | 6 |
11 | 09/22/2009 | 12/18/2009 | 87 | 1170 | 5 |
11 | 01/09/2010 | 02/03/2010 | 25 | 1145 | 4 |
11 | 02/23/2010 | 11/19/2010 | 269 | 876 | 3 |
11 | 12/01/2010 | 03/18/2011 | 107 | 769 | 2 |
11 | 05/10/2011 | 04/23/2013 | 714 | 55 | 1 |
11 | 08/03/2013 | 09/27/2013 | 55 | 0 | 0 |
If the coding would be significantly easier, I could work with something like this instead, because I could just subtract certain values from the rows to get my desired final product:
StudyID | StartDate | EndDate1 | daysbwn1 | DAYSBWN_CUM | REPEAT_NUM |
01 | 02/01/2007 | 05/01/2008 | 455 | 455 | 1 |
02 | 06/02/2012 | 05/24/2013 | 356 | 356 | 1 |
03 | 05/07/2010 | 08/19/2010 | 104 | 104 | 1 |
04 | 06/16/2007 | 06/21/2008 | 371 | 371 | 1 |
05 | 04/28/2007 | 11/07/2008 | 559 | 559 | 1 |
06 | 03/16/2007 | 06/20/2007 | 96 | 96 | 1 |
07 | 04/18/2007 | 04/19/2007 | 1 | 1 | 1 |
08 | 05/11/2007 | 08/03/2007 | 84 | 84 | 1 |
09 | 08/01/2007 | 08/28/2007 | 27 | 40 | 2 |
09 | 02/21/2008 | 03/05/2008 | 13 | 13 | 1 |
10 | 02/06/2008 | 03/12/2010 | 765 | 765 | 1 |
11 | 01/31/2007 | 12/26/2008 | 695 | 2125 | 9 |
11 | 02/11/2009 | 05/08/2009 | 86 | 1430 | 8 |
11 | 05/19/2009 | 08/14/2009 | 87 | 1344 | 7 |
11 | 09/22/2009 | 12/18/2009 | 87 | 1257 | 6 |
11 | 01/09/2010 | 02/03/2010 | 25 | 1170 | 5 |
11 | 02/23/2010 | 11/19/2010 | 269 | 1145 | 4 |
11 | 12/01/2010 | 03/18/2011 | 107 | 876 | 3 |
11 | 05/10/2011 | 04/23/2013 | 714 | 769 | 2 |
11 | 08/03/2013 | 09/27/2013 | 55 | 55 | 1 |
Any assistance would be greatly appreciated. Thank you in advance!
One approach:
data want;
daysbwn_cum=0;
repeat_num=0;
do until (last.id);
set have;
by id;
daysbwn_cum + daysbwn1;
repeat_num + 1;
end;
do until (last.id);
set have;
by id;
daysbwn_cum = daysbwn_cum - daysbwn1;
repeat_num = repeat_num - 1;
output;
end;
run;
The top loop gets the totals, then the bottom loop removes the current values and outputs the results.
One approach:
data want;
daysbwn_cum=0;
repeat_num=0;
do until (last.id);
set have;
by id;
daysbwn_cum + daysbwn1;
repeat_num + 1;
end;
do until (last.id);
set have;
by id;
daysbwn_cum = daysbwn_cum - daysbwn1;
repeat_num = repeat_num - 1;
output;
end;
run;
The top loop gets the totals, then the bottom loop removes the current values and outputs the results.
proc sort data=tempfile; by StudyID descending StartDate; run; data tempfile; set tempfile; by StudyID; if first.StudyID then do; DAYSBWN_CUM=0; REPEAT_NUM=0; output; DAYSBWN_CUM=daysbwn1; end; else do; REPEAT_NUM+1; output; DAYSBWN_CUM+daysbwn1; end; run; proc sort data=tempfile; by StudyID StartDate; run;
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.