Solved: How to get a cumulative sum of all subsequent rows

Kels123 · Posted 04-27-2017 03:26 PM

I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. DAYSBWN1 is the length of time of each period (EndDate1 – StartDate in each row). Sample code below:

data tempfile;

infile datalines truncover;

INPUT

StudyID$2.@+1 StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1 daysbwn1 ;

format Startdate EndDate1 mmddyy10.;

datalines;

01 02/01/2007 05/01/2008 455

02 06/02/2012 05/24/2013 356

03 05/07/2010 08/19/2010 104

04 06/16/2007 06/21/2008 371

05 04/28/2007 11/07/2008 559

06 03/16/2007 06/20/2007 96

07 04/18/2007 04/19/2007 1

08 05/11/2007 08/03/2007 84

09 08/01/2007 08/28/2007 27

09 02/21/2008 03/05/2008 13

10 02/06/2008 03/12/2010 765

11 01/31/2007 12/26/2008 695

11 02/11/2009 05/08/2009 86

11 05/19/2009 08/14/2009 87

11 09/22/2009 12/18/2009 87

11 01/09/2010 02/03/2010 25

11 02/23/2010 11/19/2010 269

11 12/01/2010 03/18/2011 107

11 05/10/2011 04/23/2013 714

11 08/03/2013 09/27/2013 55

;

run;

proc print data=tempfile; run;

I would like to write a code that creates two new variables:

DAYSBWN_CUM: The total number of time for all follow-up periods aka repeat rows (DAYSBWN1 added up for any rows that follow).

REPEAT_NUM: The total number of times the subject has a repeat row (REPEAT_NUM is > 1 only if the subject has more than one row).

The goal would be to get something that looks like this:

StudyID	StartDate	EndDate1	daysbwn1	DAYSBWN_CUM	REPEAT_NUM
01	02/01/2007	05/01/2008	455	0	0
02	06/02/2012	05/24/2013	356	0	0
03	05/07/2010	08/19/2010	104	0	0
04	06/16/2007	06/21/2008	371	0	0
05	04/28/2007	11/07/2008	559	0	0
06	03/16/2007	06/20/2007	96	0	0
07	04/18/2007	04/19/2007	1	0	0
08	05/11/2007	08/03/2007	84	0	0
09	08/01/2007	08/28/2007	27	13	1
09	02/21/2008	03/05/2008	13	0	0
10	02/06/2008	03/12/2010	765	0	0
11	01/31/2007	12/26/2008	695	1430	8
11	02/11/2009	05/08/2009	86	1344	7
11	05/19/2009	08/14/2009	87	1257	6
11	09/22/2009	12/18/2009	87	1170	5
11	01/09/2010	02/03/2010	25	1145	4
11	02/23/2010	11/19/2010	269	876	3
11	12/01/2010	03/18/2011	107	769	2
11	05/10/2011	04/23/2013	714	55	1
11	08/03/2013	09/27/2013	55	0	0

If the coding would be significantly easier, I could work with something like this instead, because I could just subtract certain values from the rows to get my desired final product:

StudyID	StartDate	EndDate1	daysbwn1	DAYSBWN_CUM	REPEAT_NUM
01	02/01/2007	05/01/2008	455	455	1
02	06/02/2012	05/24/2013	356	356	1
03	05/07/2010	08/19/2010	104	104	1
04	06/16/2007	06/21/2008	371	371	1
05	04/28/2007	11/07/2008	559	559	1
06	03/16/2007	06/20/2007	96	96	1
07	04/18/2007	04/19/2007	1	1	1
08	05/11/2007	08/03/2007	84	84	1
09	08/01/2007	08/28/2007	27	40	2
09	02/21/2008	03/05/2008	13	13	1
10	02/06/2008	03/12/2010	765	765	1
11	01/31/2007	12/26/2008	695	2125	9
11	02/11/2009	05/08/2009	86	1430	8
11	05/19/2009	08/14/2009	87	1344	7
11	09/22/2009	12/18/2009	87	1257	6
11	01/09/2010	02/03/2010	25	1170	5
11	02/23/2010	11/19/2010	269	1145	4
11	12/01/2010	03/18/2011	107	876	3
11	05/10/2011	04/23/2013	714	769	2
11	08/03/2013	09/27/2013	55	55	1

Any assistance would be greatly appreciated. Thank you in advance!

Astounding · Posted 04-27-2017 03:47 PM

One approach:

data want;

daysbwn_cum=0;

repeat_num=0;

do until (last.id);

set have;

by id;

daysbwn_cum + daysbwn1;

repeat_num + 1;

end;

do until (last.id);

set have;

by id;

daysbwn_cum = daysbwn_cum - daysbwn1;

repeat_num = repeat_num - 1;

output;

end;

run;

The top loop gets the totals, then the bottom loop removes the current values and outputs the results.

View solution in original post

Astounding · Posted 04-27-2017 03:47 PM

One approach:

data want;

daysbwn_cum=0;

repeat_num=0;

do until (last.id);

set have;

by id;

daysbwn_cum + daysbwn1;

repeat_num + 1;

end;

do until (last.id);

set have;

by id;

daysbwn_cum = daysbwn_cum - daysbwn1;

repeat_num = repeat_num - 1;

output;

end;

run;

The top loop gets the totals, then the bottom loop removes the current values and outputs the results.

art297 · Posted 04-27-2017 03:49 PM

proc sort data=tempfile;
  by StudyID descending StartDate;
run;

data tempfile;
  set tempfile;
  by StudyID;
  if first.StudyID then do;
    DAYSBWN_CUM=0;
    REPEAT_NUM=0;
    output;
    DAYSBWN_CUM=daysbwn1;
  end;
  else do;
    REPEAT_NUM+1;
    output;
    DAYSBWN_CUM+daysbwn1;
  end;
run;
proc sort data=tempfile;
  by StudyID StartDate;
run;

Art, CEO, AnalystFinder.com

How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

Re: How to get a cumulative sum of all subsequent rows

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away