Hi,
I am new to using macros and sas, so any help is much appreciated.
I have two tables.
one table lists accounts, start date and end date.
Accounts
ACCT | START_DATE | END_DATE |
1234 | 15JAN2020 | 15MAY2020 |
6789 | 27MAR2020 | 27MAY2020 |
Another table is a typical daily table for each account with a date and balance. There is a row for each account, on each date.
Data
ACCT | BAL | ID_DATE |
1234 | 7.00 | 15JAN2020 |
1234 | 6.59 | 16JAN2020 |
1234 | 7.56 | 17JAN2020 |
6789 | 43.54 | 27MAR2020 |
6789 | 45.66 | 28MAR2020 |
What i am trying to do is take the accounts data and for each account loop through the data able from between the start and end date to bring back the balance.
I only want to bring back the balance on a monthly basis between the start and end date from when start date starts from. so I end up something like this for example.
ACCT | BAL | ID_DATE |
1234 | 7.00 | 15JAN2020 |
1234 | 15.00 | 15FEB2020 |
1234 | 32.33 | 15MAR2020 |
1234 | 43.24 | 15APR2020 |
1234 | 12.43 | 15MAY2020 |
6789 | 43.54 | 27MAR2020 |
6789 | 43.52 | 27MAR2020 |
6789 | 23.42 | 27APR2020 |
Im not sure if i am making it over complicated by using loops or can do it using sql and some built in SAS functions.
any help very much appreciated.
data have1;
input ACCT (START_DATE END_DATE) (:date9.);
format START_DATE END_DATE date9.;
cards;
1234 15-Jan-20 15-May-20
6789 27-Mar-20 27-May-20
;
data have2;
input ACCT BAL ID_DATE :date9.;
format id_date date9.;
cards;
1234 7 15-Jan-20
1234 6.59 16-Jan-20
1234 7.56 17-Jan-20
6789 43.54 27-Mar-20
6789 45.66 28-Mar-20
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("ID_DATE") ;
h.definedata ("bal","ID_DATE") ;
h.definedone () ;
end;
do until(last.acct);
set have2;
by acct;
h.add();
end;
set have1;
do until(START_DATE>END_DATE);
if h.find(key:START_DATE) ne 0 then call missing(id_date,bal);
output;
START_DATE=intnx('mon',START_DATE,1,'s');
end;
h.clear();
keep acct id_date bal START_DATE;
rename START_DATE=date;
run;
This assumes the datasets are sorted just like your example
No loop or something like that needed, only a SQL join:
proc sql;
create table want as
select b.*
from accounts a left join data b
on a.acct = b.acct and b.id_date between a.start_date and a.end_date
;
quit;
(untested, posted from my tablet)
Add an additional condition to the ON clause:
on a.acct = b.acct and b.id_date between a.start_date and a.end_date and day(b.id_date) = day(a.start_date)
SQL does not have a looping process to create multiple rows from a single row. DATA step does.
Create a DATA step view that outputs rows for each mid-month point from START to END. Then join the view with the DATA on view's dates
Example:
data accounts; input acct start_date: date9. end_date: date9.; format start_date end_date date9.; datalines; 1234 15JAN2020 15MAY2020 6789 27MAR2020 27MAY2020 ; data dates_v / view=dates_v; set accounts; do index = 0 by 1 until (date >= end_date or index >= 1000); date = intnx('month', start_date, index, 'SAME'); output; end; format date date9.; keep acct date; run;
Then
create table want as select data.* from dates_v as V left join data on data.acct = V.acct and data.id_date = V.date
data have1;
input ACCT (START_DATE END_DATE) (:date9.);
format START_DATE END_DATE date9.;
cards;
1234 15-Jan-20 15-May-20
6789 27-Mar-20 27-May-20
;
data have2;
input ACCT BAL ID_DATE :date9.;
format id_date date9.;
cards;
1234 7 15-Jan-20
1234 6.59 16-Jan-20
1234 7.56 17-Jan-20
6789 43.54 27-Mar-20
6789 45.66 28-Mar-20
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("ID_DATE") ;
h.definedata ("bal","ID_DATE") ;
h.definedone () ;
end;
do until(last.acct);
set have2;
by acct;
h.add();
end;
set have1;
do until(START_DATE>END_DATE);
if h.find(key:START_DATE) ne 0 then call missing(id_date,bal);
output;
START_DATE=intnx('mon',START_DATE,1,'s');
end;
h.clear();
keep acct id_date bal START_DATE;
rename START_DATE=date;
run;
This assumes the datasets are sorted just like your example
Something like this?
proc sql;
create table want as select data.*
from data join accounts
on data.ID_DATE between accounts.START_DATE and accounts.END_DATE
and day(data.ID_DATE)=day(accounts.START_DATE);
quit;
Thanks for all the solutions, I am just trying them out to find which works most efficiently for my dataset. The dataset is quite large.
thanks
Ninder
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.