DATA Step, Macro, Functions and more

Summing data values between certain dates

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Summing data values between certain dates

Hi all, I have the following situation. I have two data sets (DS1 and DS2) that I want to use as a source to create a third data set (DS3). In short, the third data set (DS3) will be DS1 with an extra variable as the sum of variable X. To get the sum, you use the dates in DS1 and then sum any X variable between the previous date and current date defined in DS1. For example for A1 Visit 1, sum(X)={sum of X where date<=07/01/14}; for A1 Visit 2, sum(X)={sum of X where 07/01/14<date<=08/02/14}; etc. This code would have have to work across different subjects. I tried to use merge, first.Subject last.Subject, etc but this one stumps me. I'm not sure if a hash table would work here.

I am not proficient in proc sql so if there is a data step option would be preferred but an sql statement will do as well. Thank you for your help!

DS1

SubjectVisitDate
A1107/01/14
A1208/02/14
A1308/15/14
A2105/14/15
A2207/22/15

data ds1;

    input subid $ visit date mmddyy8.;

    datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

DS2

SubjectDateX
A106/02/143
A107/01/140
A108/02/141
A108/10/1410
A108/15/142
A205/14/152
A205/23/153
A206/01/150
A207/22/157

data ds2;

    input subid $ date mmddyy8. x;

    datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

DS3

SubjectVisitDateSum(X)
A1107/01/143+0=3
A1208/02/141
A1308/15/1410+2=12
A2105/14/152
A2207/22/153+0+7=10

Accepted Solutions
Solution
‎07-20-2015 06:37 PM
Super User
Posts: 5,083

Re: Summing data values between certain dates

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Summing data values between certain dates

I believe one can do it using data step point= or Hash, and it should be faster than the following proc sql approach, except that the latter is easier to code, let us know this is sufficient or we will move on to the data step approach:

data ds1;

     infile cards truncover dlm='09'x;

     input Subject$  Visit Date :mmddyy8.;

     cards;

A1   1    07/01/14

A1   2    08/02/14

A1   3    08/15/14

A2   1    05/14/15

A2   2    07/22/15

;

data ds2;

     infile cards truncover dlm='09'x;

     input Subject$  Date :mmddyy8.  X;

     cards;

A1   06/02/14   3

A1   07/01/14   0

A1   08/02/14   1

A1   08/10/14   10

A1   08/15/14   2

A2   05/14/15   2

A2   05/23/15   3

A2   06/01/15   0

A2   07/22/15   7

;

proc sql;

     create table ds3 as

           select *, (select sum(x) from ds2 where subject=a.subject and date <= a.date and

                date > coalesce((select date from ds1 where subject = a.subject and visit=a.visit-1),.)) as sumx

           from ds1 a

     ;

quit;

Contributor
Posts: 38

Re: Summing data values between certain dates

Thank you!!! This works as it should. That said, I humbly request a data step solution if possible. This would greatly help me understand how to do this for future data sets as well (my knowledge of sql is very low).

Solution
‎07-20-2015 06:37 PM
Super User
Posts: 5,083

Re: Summing data values between certain dates

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

Respected Advisor
Posts: 3,124

Re: Summing data values between certain dates

nice! I think somehow you need to switch the table order in the 'set' statement or modify the downstream 'from_' variable to make it work.

Super User
Posts: 5,083

Re: Summing data values between certain dates

You caught it ... fixed now.

Respected Advisor
Posts: 3,124

Re: Summing data values between certain dates

Here is a data step approach,

data ds1;

     input subid $ visit date mmddyy8.;

     format date date9.;

     datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

data ds2;

     input subid $ date mmddyy8. x;

     format date date9.;

     datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

data want;

     if _n_=1 then

           do;

                dcl hash h(dataset:'ds2(rename=date=_date)', multidata:'y');

                h.definekey('subid');

     h.definedata(all:

                'y');

                h.definedone();

                call missing(_date, x);

           end;

     set ds1;

     by subid;

     pre_date=lag(date);

     call missing(sumx);

     if first.subid then

           do;

                call missing(pre_date);

           end;

     do rc=h.find() by 0 while (rc=0);

           if pre_date < _date <= date then

                sumx+x;

           rc=h.find_next();

     end;

     keep subid vist date sumx;

run;

Super User
Posts: 9,681

Re: Summing data values between certain dates

Code: Program

data ds1;
   infile cards truncover ;
   input Subject$  Visit Date :mmddyy8.;
   cards;
A1 1 07/01/14
A1 2 08/02/14
A1 3 08/15/14
A2 1 05/14/15
A2 2 07/22/15
;
data ds2;
   infile cards truncover ;
   input Subject$  Date :mmddyy8.  X;
   cards;
A1 06/02/14 3
A1 07/01/14 0
A1 08/02/14 1
A1 08/10/14 10
A1 08/15/14 2
A2 05/14/15 2
A2 05/23/15 3
A2 06/01/15 0
A2 07/22/15 7
;

data ds1;
set ds1;
by subject;
start_date=lag(date);
if first.subject then call missing(start_date);
format start_date date mmddyy10.;
run;
proc sql;
create table want as
  select a.Subject,a.visit,a.date,sum(x) as sum_x
   from ds1 as a left join ds2 as b
   on a.subject=b.subject and a.start_date <b.date<=a.date
   group by a.Subject,a.visit,a.date;
quit;

Contributor
Posts: 38

Re: Summing data values between certain dates

To all, thank you for the tremendous help and multiple solutions!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 423 views
  • 11 likes
  • 4 in conversation