DATA Step, Macro, Functions and more

How to get a cumulative sum of all subsequent rows

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

How to get a cumulative sum of all subsequent rows

I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. DAYSBWN1 is the length of time of each period (EndDate1 – StartDate in each row). Sample code below:

 

data tempfile;

infile datalines truncover;

INPUT

StudyID$2.@+1 StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1 daysbwn1 ; 

format Startdate EndDate1 mmddyy10.;

datalines;

01 02/01/2007 05/01/2008 455 

02 06/02/2012 05/24/2013 356 

03 05/07/2010 08/19/2010 104 

04 06/16/2007 06/21/2008 371 

05 04/28/2007 11/07/2008 559 

06 03/16/2007 06/20/2007 96 

07 04/18/2007 04/19/2007 1 

08 05/11/2007 08/03/2007 84 

09 08/01/2007 08/28/2007 27 

09 02/21/2008 03/05/2008 13 

10 02/06/2008 03/12/2010 765 

11 01/31/2007 12/26/2008 695 

11 02/11/2009 05/08/2009 86 

11 05/19/2009 08/14/2009 87 

11 09/22/2009 12/18/2009 87 

11 01/09/2010 02/03/2010 25 

11 02/23/2010 11/19/2010 269 

11 12/01/2010 03/18/2011 107 

11 05/10/2011 04/23/2013 714 

11 08/03/2013 09/27/2013 55 

;

run;

proc print data=tempfile; run;

 

 

I would like to write a code that creates two new variables:

DAYSBWN_CUM: The total number of time for all follow-up periods aka repeat rows (DAYSBWN1 added up for any rows that follow).

REPEAT_NUM: The total number of times the subject has a repeat row (REPEAT_NUM is > 1 only if the subject has more than one row).

 

The goal would be to get something that looks like this:

 

StudyID

StartDate

EndDate1

daysbwn1

DAYSBWN_CUM

REPEAT_NUM

01

02/01/2007

05/01/2008

455

0

0

02

06/02/2012

05/24/2013

356

0

0

03

05/07/2010

08/19/2010

104

0

0

04

06/16/2007

06/21/2008

371

0

0

05

04/28/2007

11/07/2008

559

0

0

06

03/16/2007

06/20/2007

96

0

0

07

04/18/2007

04/19/2007

1

0

0

08

05/11/2007

08/03/2007

84

0

0

09

08/01/2007

08/28/2007

27

13

1

09

02/21/2008

03/05/2008

13

0

0

10

02/06/2008

03/12/2010

765

0

0

11

01/31/2007

12/26/2008

695

1430

8

11

02/11/2009

05/08/2009

86

1344

7

11

05/19/2009

08/14/2009

87

1257

6

11

09/22/2009

12/18/2009

87

1170

5

11

01/09/2010

02/03/2010

25

1145

4

11

02/23/2010

11/19/2010

269

876

3

11

12/01/2010

03/18/2011

107

769

2

11

05/10/2011

04/23/2013

714

55

1

11

08/03/2013

09/27/2013

55

0

0

 

If the coding would be significantly easier, I could work with something like this instead, because I could just subtract certain values from the rows to get my desired final product:

 

StudyID

StartDate

EndDate1

daysbwn1

DAYSBWN_CUM

REPEAT_NUM

01

02/01/2007

05/01/2008

455

455

1

02

06/02/2012

05/24/2013

356

356

1

03

05/07/2010

08/19/2010

104

104

1

04

06/16/2007

06/21/2008

371

371

1

05

04/28/2007

11/07/2008

559

559

1

06

03/16/2007

06/20/2007

96

96

1

07

04/18/2007

04/19/2007

1

1

1

08

05/11/2007

08/03/2007

84

84

1

09

08/01/2007

08/28/2007

27

40

2

09

02/21/2008

03/05/2008

13

13

1

10

02/06/2008

03/12/2010

765

765

1

11

01/31/2007

12/26/2008

695

2125

9

11

02/11/2009

05/08/2009

86

1430

8

11

05/19/2009

08/14/2009

87

1344

7

11

09/22/2009

12/18/2009

87

1257

6

11

01/09/2010

02/03/2010

25

1170

5

11

02/23/2010

11/19/2010

269

1145

4

11

12/01/2010

03/18/2011

107

876

3

11

05/10/2011

04/23/2013

714

769

2

11

08/03/2013

09/27/2013

55

55

1

 

Any assistance would be greatly appreciated. Thank you in advance!


Accepted Solutions
Solution
‎05-02-2017 01:35 PM
Super User
Posts: 5,503

Re: How to get a cumulative sum of all subsequent rows

[ Edited ]

One approach:

 

data want;

daysbwn_cum=0;

repeat_num=0;

do until (last.id);

   set have;

   by id;

   daysbwn_cum + daysbwn1;

   repeat_num + 1;

end;

do until (last.id);

   set have;

   by id;

   daysbwn_cum = daysbwn_cum - daysbwn1;

   repeat_num = repeat_num - 1;

   output;

end;

run;

 

The top loop gets the totals, then the bottom loop removes the current values and outputs the results.

View solution in original post


All Replies
Solution
‎05-02-2017 01:35 PM
Super User
Posts: 5,503

Re: How to get a cumulative sum of all subsequent rows

[ Edited ]

One approach:

 

data want;

daysbwn_cum=0;

repeat_num=0;

do until (last.id);

   set have;

   by id;

   daysbwn_cum + daysbwn1;

   repeat_num + 1;

end;

do until (last.id);

   set have;

   by id;

   daysbwn_cum = daysbwn_cum - daysbwn1;

   repeat_num = repeat_num - 1;

   output;

end;

run;

 

The top loop gets the totals, then the bottom loop removes the current values and outputs the results.

PROC Star
Posts: 7,471

Re: How to get a cumulative sum of all subsequent rows

proc sort data=tempfile;
  by StudyID descending StartDate;
run;

data tempfile;
  set tempfile;
  by StudyID;
  if first.StudyID then do;
    DAYSBWN_CUM=0;
    REPEAT_NUM=0;
    output;
    DAYSBWN_CUM=daysbwn1;
  end;
  else do;
    REPEAT_NUM+1;
    output;
    DAYSBWN_CUM+daysbwn1;
  end;
run;
proc sort data=tempfile;
  by StudyID StartDate;
run;

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 183 views
  • 1 like
  • 3 in conversation