Solved: Re: Summing data values between certain dates

spirto · Posted 07-20-2015 04:03 PM

Hi all, I have the following situation. I have two data sets (DS1 and DS2) that I want to use as a source to create a third data set (DS3). In short, the third data set (DS3) will be DS1 with an extra variable as the sum of variable X. To get the sum, you use the dates in DS1 and then sum any X variable between the previous date and current date defined in DS1. For example for A1 Visit 1, sum(X)={sum of X where date<=07/01/14}; for A1 Visit 2, sum(X)={sum of X where 07/01/14<date<=08/02/14}; etc. This code would have have to work across different subjects. I tried to use merge, first.Subject last.Subject, etc but this one stumps me. I'm not sure if a hash table would work here.

I am not proficient in proc sql so if there is a data step option would be preferred but an sql statement will do as well. Thank you for your help!

DS1

Subject	Visit	Date
A1	1	07/01/14
A1	2	08/02/14
A1	3	08/15/14
A2	1	05/14/15
A2	2	07/22/15

data ds1;

input subid $ visit date mmddyy8.;

datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

DS2

Subject	Date	X
A1	06/02/14	3
A1	07/01/14	0
A1	08/02/14	1
A1	08/10/14	10
A1	08/15/14	2
A2	05/14/15	2
A2	05/23/15	3
A2	06/01/15	0
A2	07/22/15	7

data ds2;

input subid $ date mmddyy8. x;

datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

DS3

Subject	Visit	Date	Sum(X)
A1	1	07/01/14	3+0=3
A1	2	08/02/14	1
A1	3	08/15/14	10+2=12
A2	1	05/14/15	2
A2	2	07/22/15	3+0+7=10

Astounding · Posted 07-20-2015 06:37 PM

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

View solution in original post

Haikuo · Posted 07-20-2015 04:49 PM

I believe one can do it using data step point= or Hash, and it should be faster than the following proc sql approach, except that the latter is easier to code, let us know this is sufficient or we will move on to the data step approach:

data ds1;

infile cards truncover dlm='09'x;

input Subject$ Visit Date :mmddyy8.;

cards;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

data ds2;

infile cards truncover dlm='09'x;

input Subject$ Date :mmddyy8. X;

cards;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

proc sql;

create table ds3 as

select *, (select sum(x) from ds2 where subject=a.subject and date <= a.date and

date > coalesce((select date from ds1 where subject = a.subject and visit=a.visit-1),.)) as sumx

from ds1 a

;

quit;

spirto · Posted 07-20-2015 05:03 PM

Thank you!!! This works as it should. That said, I humbly request a data step solution if possible. This would greatly help me understand how to do this for future data sets as well (my knowledge of sql is very low).

Astounding · Posted 07-20-2015 06:37 PM

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

Haikuo · Posted 07-20-2015 06:59 PM

nice! I think somehow you need to switch the table order in the 'set' statement or modify the downstream 'from_' variable to make it work.

Astounding · Posted 07-20-2015 10:11 PM

You caught it ... fixed now.

Haikuo · Posted 07-20-2015 06:49 PM

Here is a data step approach,

data ds1;

input subid $ visit date mmddyy8.;

format date date9.;

datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

data ds2;

input subid $ date mmddyy8. x;

format date date9.;

datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

data want;

if _n_=1 then

do;

dcl hash h(dataset:'ds2(rename=date=_date)', multidata:'y');

h.definekey('subid');

h.definedata(all:

'y');

h.definedone();

call missing(_date, x);

end;

set ds1;

by subid;

pre_date=lag(date);

call missing(sumx);

if first.subid then

do;

call missing(pre_date);

end;

do rc=h.find() by 0 while (rc=0);

if pre_date < _date <= date then

sumx+x;

rc=h.find_next();

end;

keep subid vist date sumx;

run;

Ksharp · Posted 07-21-2015 09:34 AM

Code: Program

data ds1;
   infile cards truncover ;
   input Subject$  Visit Date :mmddyy8.;
   cards;
A1 1 07/01/14
A1 2 08/02/14
A1 3 08/15/14
A2 1 05/14/15
A2 2 07/22/15
;
data ds2;
   infile cards truncover ;
   input Subject$  Date :mmddyy8.  X;
   cards;
A1 06/02/14 3
A1 07/01/14 0
A1 08/02/14 1
A1 08/10/14 10
A1 08/15/14 2
A2 05/14/15 2
A2 05/23/15 3
A2 06/01/15 0
A2 07/22/15 7
;

data ds1;
 set ds1;
 by subject;
 start_date=lag(date);
 if first.subject then call missing(start_date);
 format start_date date mmddyy10.;
run;
proc sql;
 create table want as 
  select a.Subject,a.visit,a.date,sum(x) as sum_x
   from ds1 as a left join ds2 as b 
   on a.subject=b.subject and a.start_date <b.date<=a.date
   group by a.Subject,a.visit,a.date;
quit;

spirto · Posted 07-21-2015 10:09 AM

To all, thank you for the tremendous help and multiple solutions!

Code: Program

SAS Innovate 2025: Call for Content

Classroom Training Available!