I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. DAYSBWN1 is the length of time of each period (EndDate1 – StartDate in each row). Sample code below:
data tempfile;
infile datalines truncover;
INPUT
StudyID$2.@+1 StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1 daysbwn1 ;
format Startdate EndDate1 mmddyy10.;
datalines;
01 02/01/2007 05/01/2008 455
02 06/02/2012 05/24/2013 356
03 05/07/2010 08/19/2010 104
04 06/16/2007 06/21/2008 371
05 04/28/2007 11/07/2008 559
06 03/16/2007 06/20/2007 96
07 04/18/2007 04/19/2007 1
08 05/11/2007 08/03/2007 84
09 08/01/2007 08/28/2007 27
09 02/21/2008 03/05/2008 13
10 02/06/2008 03/12/2010 765
11 01/31/2007 12/26/2008 695
11 02/11/2009 05/08/2009 86
11 05/19/2009 08/14/2009 87
11 09/22/2009 12/18/2009 87
11 01/09/2010 02/03/2010 25
11 02/23/2010 11/19/2010 269
11 12/01/2010 03/18/2011 107
11 05/10/2011 04/23/2013 714
11 08/03/2013 09/27/2013 55
;
run;
proc print data=tempfile; run;
I would like to write a code that creates two new variables:
DAYSBWN_CUM: The total number of time for all follow-up periods aka repeat rows (DAYSBWN1 added up for any rows that follow).
REPEAT_NUM: The total number of times the subject has a repeat row (REPEAT_NUM is > 1 only if the subject has more than one row).
The goal would be to get something that looks like this:
StudyID | StartDate | EndDate1 | daysbwn1 | DAYSBWN_CUM | REPEAT_NUM |
01 | 02/01/2007 | 05/01/2008 | 455 | 0 | 0 |
02 | 06/02/2012 | 05/24/2013 | 356 | 0 | 0 |
03 | 05/07/2010 | 08/19/2010 | 104 | 0 | 0 |
04 | 06/16/2007 | 06/21/2008 | 371 | 0 | 0 |
05 | 04/28/2007 | 11/07/2008 | 559 | 0 | 0 |
06 | 03/16/2007 | 06/20/2007 | 96 | 0 | 0 |
07 | 04/18/2007 | 04/19/2007 | 1 | 0 | 0 |
08 | 05/11/2007 | 08/03/2007 | 84 | 0 | 0 |
09 | 08/01/2007 | 08/28/2007 | 27 | 13 | 1 |
09 | 02/21/2008 | 03/05/2008 | 13 | 0 | 0 |
10 | 02/06/2008 | 03/12/2010 | 765 | 0 | 0 |
11 | 01/31/2007 | 12/26/2008 | 695 | 1430 | 8 |
11 | 02/11/2009 | 05/08/2009 | 86 | 1344 | 7 |
11 | 05/19/2009 | 08/14/2009 | 87 | 1257 | 6 |
11 | 09/22/2009 | 12/18/2009 | 87 | 1170 | 5 |
11 | 01/09/2010 | 02/03/2010 | 25 | 1145 | 4 |
11 | 02/23/2010 | 11/19/2010 | 269 | 876 | 3 |
11 | 12/01/2010 | 03/18/2011 | 107 | 769 | 2 |
11 | 05/10/2011 | 04/23/2013 | 714 | 55 | 1 |
11 | 08/03/2013 | 09/27/2013 | 55 | 0 | 0 |
If the coding would be significantly easier, I could work with something like this instead, because I could just subtract certain values from the rows to get my desired final product:
StudyID | StartDate | EndDate1 | daysbwn1 | DAYSBWN_CUM | REPEAT_NUM |
01 | 02/01/2007 | 05/01/2008 | 455 | 455 | 1 |
02 | 06/02/2012 | 05/24/2013 | 356 | 356 | 1 |
03 | 05/07/2010 | 08/19/2010 | 104 | 104 | 1 |
04 | 06/16/2007 | 06/21/2008 | 371 | 371 | 1 |
05 | 04/28/2007 | 11/07/2008 | 559 | 559 | 1 |
06 | 03/16/2007 | 06/20/2007 | 96 | 96 | 1 |
07 | 04/18/2007 | 04/19/2007 | 1 | 1 | 1 |
08 | 05/11/2007 | 08/03/2007 | 84 | 84 | 1 |
09 | 08/01/2007 | 08/28/2007 | 27 | 40 | 2 |
09 | 02/21/2008 | 03/05/2008 | 13 | 13 | 1 |
10 | 02/06/2008 | 03/12/2010 | 765 | 765 | 1 |
11 | 01/31/2007 | 12/26/2008 | 695 | 2125 | 9 |
11 | 02/11/2009 | 05/08/2009 | 86 | 1430 | 8 |
11 | 05/19/2009 | 08/14/2009 | 87 | 1344 | 7 |
11 | 09/22/2009 | 12/18/2009 | 87 | 1257 | 6 |
11 | 01/09/2010 | 02/03/2010 | 25 | 1170 | 5 |
11 | 02/23/2010 | 11/19/2010 | 269 | 1145 | 4 |
11 | 12/01/2010 | 03/18/2011 | 107 | 876 | 3 |
11 | 05/10/2011 | 04/23/2013 | 714 | 769 | 2 |
11 | 08/03/2013 | 09/27/2013 | 55 | 55 | 1 |
Any assistance would be greatly appreciated. Thank you in advance!
One approach:
data want;
daysbwn_cum=0;
repeat_num=0;
do until (last.id);
set have;
by id;
daysbwn_cum + daysbwn1;
repeat_num + 1;
end;
do until (last.id);
set have;
by id;
daysbwn_cum = daysbwn_cum - daysbwn1;
repeat_num = repeat_num - 1;
output;
end;
run;
The top loop gets the totals, then the bottom loop removes the current values and outputs the results.
One approach:
data want;
daysbwn_cum=0;
repeat_num=0;
do until (last.id);
set have;
by id;
daysbwn_cum + daysbwn1;
repeat_num + 1;
end;
do until (last.id);
set have;
by id;
daysbwn_cum = daysbwn_cum - daysbwn1;
repeat_num = repeat_num - 1;
output;
end;
run;
The top loop gets the totals, then the bottom loop removes the current values and outputs the results.
proc sort data=tempfile; by StudyID descending StartDate; run; data tempfile; set tempfile; by StudyID; if first.StudyID then do; DAYSBWN_CUM=0; REPEAT_NUM=0; output; DAYSBWN_CUM=daysbwn1; end; else do; REPEAT_NUM+1; output; DAYSBWN_CUM+daysbwn1; end; run; proc sort data=tempfile; by StudyID StartDate; run;
Art, CEO, AnalystFinder.com
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.