Solved: Total sum before and after dates in a dataset

Zatere · Posted 07-26-2024 04:33 AM

Hi there,

I have a question and hopefully I will make sense.

I have two tables:

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

I want to combine them in such a way that in the first table I can add the total sum of the second table for before-or-equal and after the dates in the first table.

Essentially, I want the following:

data want;
input ID $ DT :yymmdd10. Before_upto_DT After_DT;
format DT yymmddd10.;
datalines;
A 2024-07-04 60 430
A 2024-07-25 390 100
; 
run;

My attempt was to join them using PROC SQL but obviously this is not working.

Any help please.

Thanks in advance.

Kurt_Bremser · Posted 07-26-2024 01:09 PM

Make a view:

data have2b / view=have2b;
merge
  have1 (in=h1 keep=id)
  have2
;
by id;
if h1;
run;

The view is executed during the "want" DATA step, so it should not much affect the overall performance.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Ksharp · Posted 07-26-2024 04:41 AM

Yes. You pick up the right tool (SQL) for small data.

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

proc sql;
create table want as
select *,
 (select sum(AMT) from have2 where ID=a.ID and DT<=a.DT) as Before_upto_DT ,
 (select sum(AMT) from have2 where ID=a.ID and DT> a.DT) as After_DT
 from have1 as a;
quit;

Zatere · Posted 07-26-2024 04:55 AM

Hi Ksharp thanks for your reply. It is working indeed. However, as you have mentioned this solution would work for small data. Both tables that I have are quite big so it takes a long time to finish.

Ksharp · Posted 07-26-2024 05:37 AM

How big are these two tables ?

You could try Hash Table.

data have1;
input ID $ DT :yymmdd10.;
format DT yymmddd10.;
datalines;
A 2024-07-04
A 2024-07-25
;
run;

data have2;
input ID $ DT :yymmdd10. AMT;
format DT yymmddd10.;
datalines;
A 2024-07-03 10
A 2024-07-04 20
A 2024-07-04 30
A 2024-07-11 40
A 2024-07-11 50
A 2024-07-16 70
A 2024-07-16 80
A 2024-07-25 90
A 2024-07-26 100
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have2;
  declare hash h(dataset:'have2',multidata:'y',hashexp:20);
  h.definekey('ID','DT');
  h.definedata('AMT');
  h.definedone();
 end;
set have1;
Before_upto_DT=0;
do i=DT-100*356 to DT;
 rc=h.find(key:ID,key:i);
 do while(rc=0);
   Before_upto_DT+AMT;
   rc=h.find_next(key:ID,key:i);
 end;
end;

After_DT=0;
do i=DT+1 to DT+100*356 ;
 rc=h.find(key:ID,key:i);
 do while(rc=0);
   After_DT+AMT;
   rc=h.find_next(key:ID,key:i);
 end;
end;

drop rc i;
run;

Kurt_Bremser · Posted 07-26-2024 06:46 AM

For large datasets, I recommend defining a date-indexed array which you load for every ID with the data from have2.

Then you work through the entries in have1 for that ID and calculate in DO loops.

How large are your datasets? Is there at least one entry in have1 for every ID in have2?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-26-2024 06:51 AM

Both are quite big tables. Especially the have2 goes up to many millions of rows. Do you perhaps have an example of this approach?

Kurt_Bremser · Posted 07-26-2024 10:33 AM

Please answer my second question. The answer is crucial to how the approach must be coded.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-26-2024 10:40 AM

Sorry about that. Yes there is at least one entry in have1 for every ID in have2.

Kurt_Bremser · Posted 07-26-2024 12:14 PM

So we can try it with synchronized groups, using a date-indexed array:

%let start = %sysfunc(inputn(1900-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2999-12-31,yymmdd10.));

data want;
array vals {&start.:&end.} _temporary_;
do i = &start. to &end.;
  vals{i} = 0;
end;
do until (last.id);
  set have2;
  by id;
  vals{dt} = vals{dt} + amt;
end;
do until (last.id);
  set have1;
  by id;
  Before_upto_DT = 0;
  After_DT = 0;
  do i = &start. to dt;
    Before_upto_DT = Before_upto_DT + vals{i};
  end;
  do i = dt + 1 to &end.;
    After_DT =  After_DT + vals{i};
  end;
  output;
end;
drop i amt;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-26-2024 12:52 PM

It is working so thank you. Actually there are IDs in have2 which are not in have1. I just added a small step to create the have2b which has only IDs that are in have1. How can I modify your approach to address this as well please?

Kurt_Bremser · Posted 07-26-2024 01:09 PM

Make a view:

data have2b / view=have2b;
merge
  have1 (in=h1 keep=id)
  have2
;
by id;
if h1;
run;

The view is executed during the "want" DATA step, so it should not much affect the overall performance.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-26-2024 01:15 PM

I just learnt something new!

Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

Re: Total sum before and after dates in a dataset

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away