@Ronein wrote:
9 lines
a,03Jun2022,101
a,06Jun2022,101
b,02Jun2022,201
b,03Jun2022,201
b,07Jun2022,203
c,03Jun2022,301
c,06Jun2022,302
c,07Jun2022,303
c,08Jun2022,304
But in real life will have 520 lines???
Why should it? It's just a "fake" dataset created to illustrate the function of the code presented in the post.
@Ronein wrote:
Honestly ,I hate hash table. Is there another way?
HASH tables are not really that hard, they are use "foreign" to the normal SAS way of working.
The advantage they have is that you can index into them by things like CUSTOMER_ID that is probably not a simple integer.
In this case if the set of loans is small enough to fit in memory then you can find the obligations you are looking for with a single pass through the CSV file without having to create a giant SAS dataset.
The basic idea is to read a line from the CSV file and then use the HASH() object to decide if you need to keep that obligation amount. When you get to the end of the data you can then write the HASH() object back to a dataset.
You can use your list of business days in two ways. One you can use it to help you figure out which date is the previous business day (or the previous previous business day). Two you can use it to tell you which CSV files you need to read. I assume you can figure out the name of the CSV file based on the date.
So let's create some an example LOANS and BDAYS dataset and also a series of CSV files. Click the SPOILER tag to see the code.
data loans;
input customer_id $ date :yymmdd. amount;
format date yymmdd10.;
datalines;
a 2022-06-06 100
b 2022-06-07 200
c 2022-06-08 300
;
data bdays;
input date :yymmdd.;
format date yymmdd10.;
datalines;
2022-06-02
2022-06-03
2022-06-06
2022-06-07
2022-06-08
;
data obligations;
input customer_id $ date :yymmdd. obligation;
format date yymmdd10.;
datalines;
b 2022-06-02 201
a 2022-06-03 101
b 2022-06-03 201
c 2022-06-03 301
a 2022-06-06 101
c 2022-06-06 302
b 2022-06-07 203
c 2022-06-07 303
c 2022-06-08 304
;
%let path=%sysfunc(pathname(work));
proc sort data=obligations;
by date customer_id ;
run;
data _null_;
set obligations end=eof;
filename=cats("&path/balance_",put(date,yymmddn8.),'.csv');
file csv filevar=filename dsd ;
put customer_id date obligation ;
if _n_=1 then call symputx('start',date);
if eof then call symputx('end',date);
run;
Now let's see how to use those datasets and files to make what you want.
First we need to know the path where to find the CSV files. Also since this code is going to use a temporary array to figure out how to map from current business day to the next business day it will help to have a lower and upper bound on the range of dates.
%let start=%sysfunc(mdy(02,06,2022));
%let end=%sysfunc(mdy(08,06,2022));
%let path=%sysfunc(pathname(work));
Now here is the data step to read the obligations files and collect upto three obligation values per loan. The value on the loan date, on the previous business date and also the previous previous business date.
First it will load the business days into a temporary array that map a date to the next business date.
Second it will load the loan date into a hash that stores the customer, date, amount, and the three obligations.
Third it will read in the list of dates and for each date read in the corresponding CSV file.
For each record in the CSV file it checks if that customer+date matches a loan. If so it stores the obligation in the hash.
Then if checks if the next business is a loan date for this customer and if so it stores the obligation. And repeats the look back (ahead) once more.
Finally at the end it writes the hash to a dataset.
data _null_;
if 0 then set bdays loans ;
if _n_=1 then do;
array nday [&start:&end] _temporary_;
do while(not eof);
set bdays end=eof;
lag_date=lag(date);
if not missing(lag_date) then nday[lag_date]=date;
end;
declare hash h(ordered:'yes');
h.definekey('customer_id','date');
h.definedata('customer_id','date','amount','owe1','owe2','owe3');
h.definedone();
do while(not eof1);
set loans end=eof1;
h.add();
end;
end;
set bdays end=eof2;
filename = cats("&path/balance_",put(date,yymmddn8.),'.csv');
infile csv filevar=filename dsd truncover /*firstobs=2*/ end=eof3;
do while(not eof3);
input customer_id date :yymmdd. obligation ;
date=date;
if 0=h.find() then do; owe1=obligation; rc=h.replace(); end;
date=nday[date];
if not missing(date) then do;
if 0=h.find() then do; owe2=obligation; rc=h.replace(); end;
date=nday[date];
if not missing(date) then do;
if 0=h.find() then do; owe3=obligation; rc=h.replace(); end;
end;
end;
end;
if eof3 then rc=h.output(dataset:'loan_obligations');
run;
Results:
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.