Hi there,
I have a question and hopefully I will make sense.
I have two tables:
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
I want to combine them in such a way that in the first table I can add the total sum of the second table for before-or-equal and after the dates in the first table.
Essentially, I want the following:
data want;
input ID $ DT :yymmdd10. Before_upto_DT After_DT;
format DT yymmddd10.;
datalines;
A 2024-07-04 60 430
A 2024-07-25 390 100
;
run;
My attempt was to join them using PROC SQL but obviously this is not working.
Any help please.
Thanks in advance.
Make a view:
data have2b / view=have2b;
merge
have1 (in=h1 keep=id)
have2
;
by id;
if h1;
run;
The view is executed during the "want" DATA step, so it should not much affect the overall performance.
Yes. You pick up the right tool (SQL) for small data.
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
proc sql;
create table want as
select *,
(select sum(AMT) from have2 where ID=a.ID and DT<=a.DT) as Before_upto_DT ,
(select sum(AMT) from have2 where ID=a.ID and DT> a.DT) as After_DT
from have1 as a;
quit;
How big are these two tables ?
You could try Hash Table.
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash h(dataset:'have2',multidata:'y',hashexp:20);
h.definekey('ID','DT');
h.definedata('AMT');
h.definedone();
end;
set have1;
Before_upto_DT=0;
do i=DT-100*356 to DT;
rc=h.find(key:ID,key:i);
do while(rc=0);
Before_upto_DT+AMT;
rc=h.find_next(key:ID,key:i);
end;
end;
After_DT=0;
do i=DT+1 to DT+100*356 ;
rc=h.find(key:ID,key:i);
do while(rc=0);
After_DT+AMT;
rc=h.find_next(key:ID,key:i);
end;
end;
drop rc i;
run;
For large datasets, I recommend defining a date-indexed array which you load for every ID with the data from have2.
Then you work through the entries in have1 for that ID and calculate in DO loops.
How large are your datasets? Is there at least one entry in have1 for every ID in have2?
Please answer my second question. The answer is crucial to how the approach must be coded.
So we can try it with synchronized groups, using a date-indexed array:
%let start = %sysfunc(inputn(1900-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2999-12-31,yymmdd10.));
data want;
array vals {&start.:&end.} _temporary_;
do i = &start. to &end.;
vals{i} = 0;
end;
do until (last.id);
set have2;
by id;
vals{dt} = vals{dt} + amt;
end;
do until (last.id);
set have1;
by id;
Before_upto_DT = 0;
After_DT = 0;
do i = &start. to dt;
Before_upto_DT = Before_upto_DT + vals{i};
end;
do i = dt + 1 to &end.;
After_DT = After_DT + vals{i};
end;
output;
end;
drop i amt;
run;
Make a view:
data have2b / view=have2b;
merge
have1 (in=h1 keep=id)
have2
;
by id;
if h1;
run;
The view is executed during the "want" DATA step, so it should not much affect the overall performance.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.