BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zatere
Quartz | Level 8

Hi there,

 

I have a question and hopefully I will make sense.

 

I have two tables:

 

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

I want to combine them in such a way that in the first table I can add the total sum of the second table for before-or-equal and after the dates in the first table.

 

Essentially, I want the following:

data want;
input ID $ DT :yymmdd10. Before_upto_DT After_DT;
format DT yymmddd10.;
datalines;
A 2024-07-04 60 430
A 2024-07-25 390 100
; 
run;

My attempt was to join them using PROC SQL but obviously this is not working.

 

Any help please.

 

Thanks in advance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Make a view:

data have2b / view=have2b;
merge
  have1 (in=h1 keep=id)
  have2
;
by id;
if h1;
run;

The view is executed during the "want" DATA step, so it should not much affect the overall performance.

View solution in original post

11 REPLIES 11
Ksharp
Super User

Yes. You pick up the right tool (SQL) for small data.

 

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

proc sql;
create table want as
select *,
 (select sum(AMT) from have2 where ID=a.ID and DT<=a.DT) as Before_upto_DT ,
 (select sum(AMT) from have2 where ID=a.ID and DT> a.DT) as After_DT
 from have1 as a;
quit;
Zatere
Quartz | Level 8
Hi Ksharp thanks for your reply. It is working indeed. However, as you have mentioned this solution would work for small data. Both tables that I have are quite big so it takes a long time to finish.
Ksharp
Super User

How big are these two tables ?

You could try Hash Table.

 

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have2;
  declare hash h(dataset:'have2',multidata:'y',hashexp:20);
  h.definekey('ID','DT');
  h.definedata('AMT');
  h.definedone();
 end;
set have1;
Before_upto_DT=0;
do i=DT-100*356 to DT;
 rc=h.find(key:ID,key:i);
 do while(rc=0);
   Before_upto_DT+AMT;
   rc=h.find_next(key:ID,key:i);
 end;
end;

After_DT=0;
do i=DT+1 to DT+100*356 ;
 rc=h.find(key:ID,key:i);
 do while(rc=0);
   After_DT+AMT;
   rc=h.find_next(key:ID,key:i);
 end;
end;

drop rc i;
run;

 

 

Kurt_Bremser
Super User

For large datasets, I recommend defining a date-indexed array which you load for every ID with the data from have2.

Then you work through the entries in have1 for that ID and calculate in DO loops.

How large are your datasets? Is there at least one entry in have1 for every ID in have2?

 

Zatere
Quartz | Level 8
Both are quite big tables. Especially the have2 goes up to many millions of rows. Do you perhaps have an example of this approach?
Zatere
Quartz | Level 8
Sorry about that. Yes there is at least one entry in have1 for every ID in have2.
Kurt_Bremser
Super User

So we can try it with synchronized groups, using a date-indexed array:

%let start = %sysfunc(inputn(1900-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2999-12-31,yymmdd10.));

data want;
array vals {&start.:&end.} _temporary_;
do i = &start. to &end.;
  vals{i} = 0;
end;
do until (last.id);
  set have2;
  by id;
  vals{dt} = vals{dt} + amt;
end;
do until (last.id);
  set have1;
  by id;
  Before_upto_DT = 0;
  After_DT = 0;
  do i = &start. to dt;
    Before_upto_DT = Before_upto_DT + vals{i};
  end;
  do i = dt + 1 to &end.;
    After_DT =  After_DT + vals{i};
  end;
  output;
end;
drop i amt;
run;
Zatere
Quartz | Level 8
It is working so thank you. Actually there are IDs in have2 which are not in have1. I just added a small step to create the have2b which has only IDs that are in have1. How can I modify your approach to address this as well please?
Kurt_Bremser
Super User

Make a view:

data have2b / view=have2b;
merge
  have1 (in=h1 keep=id)
  have2
;
by id;
if h1;
run;

The view is executed during the "want" DATA step, so it should not much affect the overall performance.

Zatere
Quartz | Level 8
I just learnt something new!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1039 views
  • 2 likes
  • 3 in conversation