Hi,
I am new to using macros and sas, so any help is much appreciated.
I have two tables.
one table lists accounts, start date and end date.
Accounts
ACCT | START_DATE | END_DATE |
1234 | 15JAN2020 | 15MAY2020 |
6789 | 27MAR2020 | 27MAY2020 |
Another table is a typical daily table for each account with a date and balance. There is a row for each account, on each date.
Data
ACCT | BAL | ID_DATE |
1234 | 7.00 | 15JAN2020 |
1234 | 6.59 | 16JAN2020 |
1234 | 7.56 | 17JAN2020 |
6789 | 43.54 | 27MAR2020 |
6789 | 45.66 | 28MAR2020 |
What i am trying to do is take the accounts data and for each account loop through the data able from between the start and end date to bring back the balance.
I only want to bring back the balance on a monthly basis between the start and end date from when start date starts from. so I end up something like this for example.
ACCT | BAL | ID_DATE |
1234 | 7.00 | 15JAN2020 |
1234 | 15.00 | 15FEB2020 |
1234 | 32.33 | 15MAR2020 |
1234 | 43.24 | 15APR2020 |
1234 | 12.43 | 15MAY2020 |
6789 | 43.54 | 27MAR2020 |
6789 | 43.52 | 27MAR2020 |
6789 | 23.42 | 27APR2020 |
Im not sure if i am making it over complicated by using loops or can do it using sql and some built in SAS functions.
any help very much appreciated.
data have1;
input ACCT (START_DATE END_DATE) (:date9.);
format START_DATE END_DATE date9.;
cards;
1234 15-Jan-20 15-May-20
6789 27-Mar-20 27-May-20
;
data have2;
input ACCT BAL ID_DATE :date9.;
format id_date date9.;
cards;
1234 7 15-Jan-20
1234 6.59 16-Jan-20
1234 7.56 17-Jan-20
6789 43.54 27-Mar-20
6789 45.66 28-Mar-20
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("ID_DATE") ;
h.definedata ("bal","ID_DATE") ;
h.definedone () ;
end;
do until(last.acct);
set have2;
by acct;
h.add();
end;
set have1;
do until(START_DATE>END_DATE);
if h.find(key:START_DATE) ne 0 then call missing(id_date,bal);
output;
START_DATE=intnx('mon',START_DATE,1,'s');
end;
h.clear();
keep acct id_date bal START_DATE;
rename START_DATE=date;
run;
This assumes the datasets are sorted just like your example
No loop or something like that needed, only a SQL join:
proc sql;
create table want as
select b.*
from accounts a left join data b
on a.acct = b.acct and b.id_date between a.start_date and a.end_date
;
quit;
(untested, posted from my tablet)
Add an additional condition to the ON clause:
on a.acct = b.acct and b.id_date between a.start_date and a.end_date and day(b.id_date) = day(a.start_date)
SQL does not have a looping process to create multiple rows from a single row. DATA step does.
Create a DATA step view that outputs rows for each mid-month point from START to END. Then join the view with the DATA on view's dates
Example:
data accounts; input acct start_date: date9. end_date: date9.; format start_date end_date date9.; datalines; 1234 15JAN2020 15MAY2020 6789 27MAR2020 27MAY2020 ; data dates_v / view=dates_v; set accounts; do index = 0 by 1 until (date >= end_date or index >= 1000); date = intnx('month', start_date, index, 'SAME'); output; end; format date date9.; keep acct date; run;
Then
create table want as select data.* from dates_v as V left join data on data.acct = V.acct and data.id_date = V.date
data have1;
input ACCT (START_DATE END_DATE) (:date9.);
format START_DATE END_DATE date9.;
cards;
1234 15-Jan-20 15-May-20
6789 27-Mar-20 27-May-20
;
data have2;
input ACCT BAL ID_DATE :date9.;
format id_date date9.;
cards;
1234 7 15-Jan-20
1234 6.59 16-Jan-20
1234 7.56 17-Jan-20
6789 43.54 27-Mar-20
6789 45.66 28-Mar-20
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("ID_DATE") ;
h.definedata ("bal","ID_DATE") ;
h.definedone () ;
end;
do until(last.acct);
set have2;
by acct;
h.add();
end;
set have1;
do until(START_DATE>END_DATE);
if h.find(key:START_DATE) ne 0 then call missing(id_date,bal);
output;
START_DATE=intnx('mon',START_DATE,1,'s');
end;
h.clear();
keep acct id_date bal START_DATE;
rename START_DATE=date;
run;
This assumes the datasets are sorted just like your example
Something like this?
proc sql;
create table want as select data.*
from data join accounts
on data.ID_DATE between accounts.START_DATE and accounts.END_DATE
and day(data.ID_DATE)=day(accounts.START_DATE);
quit;
Thanks for all the solutions, I am just trying them out to find which works most efficiently for my dataset. The dataset is quite large.
thanks
Ninder
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.