BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
spirto
Obsidian | Level 7

Hi all, I have the following situation. I have two data sets (DS1 and DS2) that I want to use as a source to create a third data set (DS3). In short, the third data set (DS3) will be DS1 with an extra variable as the sum of variable X. To get the sum, you use the dates in DS1 and then sum any X variable between the previous date and current date defined in DS1. For example for A1 Visit 1, sum(X)={sum of X where date<=07/01/14}; for A1 Visit 2, sum(X)={sum of X where 07/01/14<date<=08/02/14}; etc. This code would have have to work across different subjects. I tried to use merge, first.Subject last.Subject, etc but this one stumps me. I'm not sure if a hash table would work here.

I am not proficient in proc sql so if there is a data step option would be preferred but an sql statement will do as well. Thank you for your help!

DS1

SubjectVisitDate
A1107/01/14
A1208/02/14
A1308/15/14
A2105/14/15
A2207/22/15

data ds1;

    input subid $ visit date mmddyy8.;

    datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

DS2

SubjectDateX
A106/02/143
A107/01/140
A108/02/141
A108/10/1410
A108/15/142
A205/14/152
A205/23/153
A206/01/150
A207/22/157

data ds2;

    input subid $ date mmddyy8. x;

    datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

DS3

SubjectVisitDateSum(X)
A1107/01/143+0=3
A1208/02/141
A1308/15/1410+2=12
A2105/14/152
A2207/22/153+0+7=10
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

I believe one can do it using data step point= or Hash, and it should be faster than the following proc sql approach, except that the latter is easier to code, let us know this is sufficient or we will move on to the data step approach:

data ds1;

     infile cards truncover dlm='09'x;

     input Subject$  Visit Date :mmddyy8.;

     cards;

A1   1    07/01/14

A1   2    08/02/14

A1   3    08/15/14

A2   1    05/14/15

A2   2    07/22/15

;

data ds2;

     infile cards truncover dlm='09'x;

     input Subject$  Date :mmddyy8.  X;

     cards;

A1   06/02/14   3

A1   07/01/14   0

A1   08/02/14   1

A1   08/10/14   10

A1   08/15/14   2

A2   05/14/15   2

A2   05/23/15   3

A2   06/01/15   0

A2   07/22/15   7

;

proc sql;

     create table ds3 as

           select *, (select sum(x) from ds2 where subject=a.subject and date <= a.date and

                date > coalesce((select date from ds1 where subject = a.subject and visit=a.visit-1),.)) as sumx

           from ds1 a

     ;

quit;

spirto
Obsidian | Level 7

Thank you!!! This works as it should. That said, I humbly request a data step solution if possible. This would greatly help me understand how to do this for future data sets as well (my knowledge of sql is very low).

Astounding
PROC Star

Here's a fairly straightforward DATA step version:

data want (drop=X);

set ds2 (in=from_ds2) ds1 (in=from_ds1);

by subid date;

if first.subid then total_x=0;

if from_ds2 then total_x + x;

if from_ds1;

output;

total_x=0;

run;

There are a few sophisticated pieces here, but the code is pretty short and each piece should be comprehensible.

Good luck.

Haikuo
Onyx | Level 15

nice! I think somehow you need to switch the table order in the 'set' statement or modify the downstream 'from_' variable to make it work.

Astounding
PROC Star

You caught it ... fixed now.

Haikuo
Onyx | Level 15

Here is a data step approach,

data ds1;

     input subid $ visit date mmddyy8.;

     format date date9.;

     datalines;

A1 1 07/01/14

A1 2 08/02/14

A1 3 08/15/14

A2 1 05/14/15

A2 2 07/22/15

;

data ds2;

     input subid $ date mmddyy8. x;

     format date date9.;

     datalines;

A1 06/02/14 3

A1 07/01/14 0

A1 08/02/14 1

A1 08/10/14 10

A1 08/15/14 2

A2 05/14/15 2

A2 05/23/15 3

A2 06/01/15 0

A2 07/22/15 7

;

data want;

     if _n_=1 then

           do;

                dcl hash h(dataset:'ds2(rename=date=_date)', multidata:'y');

                h.definekey('subid');

     h.definedata(all:

                'y');

                h.definedone();

                call missing(_date, x);

           end;

     set ds1;

     by subid;

     pre_date=lag(date);

     call missing(sumx);

     if first.subid then

           do;

                call missing(pre_date);

           end;

     do rc=h.find() by 0 while (rc=0);

           if pre_date < _date <= date then

                sumx+x;

           rc=h.find_next();

     end;

     keep subid vist date sumx;

run;

Ksharp
Super User

Code: Program

data ds1;
   infile cards truncover ;
   input Subject$  Visit Date :mmddyy8.;
   cards;
A1 1 07/01/14
A1 2 08/02/14
A1 3 08/15/14
A2 1 05/14/15
A2 2 07/22/15
;
data ds2;
   infile cards truncover ;
   input Subject$  Date :mmddyy8.  X;
   cards;
A1 06/02/14 3
A1 07/01/14 0
A1 08/02/14 1
A1 08/10/14 10
A1 08/15/14 2
A2 05/14/15 2
A2 05/23/15 3
A2 06/01/15 0
A2 07/22/15 7
;

data ds1;
set ds1;
by subject;
start_date=lag(date);
if first.subject then call missing(start_date);
format start_date date mmddyy10.;
run;
proc sql;
create table want as
  select a.Subject,a.visit,a.date,sum(x) as sum_x
   from ds1 as a left join ds2 as b
   on a.subject=b.subject and a.start_date <b.date<=a.date
   group by a.Subject,a.visit,a.date;
quit;

spirto
Obsidian | Level 7

To all, thank you for the tremendous help and multiple solutions!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2368 views
  • 11 likes
  • 4 in conversation