Hi there,
I have a question and hopefully I will make sense.
I have two tables:
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
I want to combine them in such a way that in the first table I can add the total sum of the second table for before-or-equal and after the dates in the first table.
Essentially, I want the following:
data want;
input ID $ DT :yymmdd10. Before_upto_DT After_DT;
format DT yymmddd10.;
datalines;
A 2024-07-04 60 430
A 2024-07-25 390 100
;
run;
My attempt was to join them using PROC SQL but obviously this is not working.
Any help please.
Thanks in advance.
Make a view:
data have2b / view=have2b;
merge
have1 (in=h1 keep=id)
have2
;
by id;
if h1;
run;
The view is executed during the "want" DATA step, so it should not much affect the overall performance.
Yes. You pick up the right tool (SQL) for small data.
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
proc sql;
create table want as
select *,
(select sum(AMT) from have2 where ID=a.ID and DT<=a.DT) as Before_upto_DT ,
(select sum(AMT) from have2 where ID=a.ID and DT> a.DT) as After_DT
from have1 as a;
quit;
How big are these two tables ?
You could try Hash Table.
data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;
data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash h(dataset:'have2',multidata:'y',hashexp:20);
h.definekey('ID','DT');
h.definedata('AMT');
h.definedone();
end;
set have1;
Before_upto_DT=0;
do i=DT-100*356 to DT;
rc=h.find(key:ID,key:i);
do while(rc=0);
Before_upto_DT+AMT;
rc=h.find_next(key:ID,key:i);
end;
end;
After_DT=0;
do i=DT+1 to DT+100*356 ;
rc=h.find(key:ID,key:i);
do while(rc=0);
After_DT+AMT;
rc=h.find_next(key:ID,key:i);
end;
end;
drop rc i;
run;
For large datasets, I recommend defining a date-indexed array which you load for every ID with the data from have2.
Then you work through the entries in have1 for that ID and calculate in DO loops.
How large are your datasets? Is there at least one entry in have1 for every ID in have2?
Please answer my second question. The answer is crucial to how the approach must be coded.
So we can try it with synchronized groups, using a date-indexed array:
%let start = %sysfunc(inputn(1900-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2999-12-31,yymmdd10.));
data want;
array vals {&start.:&end.} _temporary_;
do i = &start. to &end.;
vals{i} = 0;
end;
do until (last.id);
set have2;
by id;
vals{dt} = vals{dt} + amt;
end;
do until (last.id);
set have1;
by id;
Before_upto_DT = 0;
After_DT = 0;
do i = &start. to dt;
Before_upto_DT = Before_upto_DT + vals{i};
end;
do i = dt + 1 to &end.;
After_DT = After_DT + vals{i};
end;
output;
end;
drop i amt;
run;
Make a view:
data have2b / view=have2b;
merge
have1 (in=h1 keep=id)
have2
;
by id;
if h1;
run;
The view is executed during the "want" DATA step, so it should not much affect the overall performance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.