HI, I want to find out September monthly average balance of each customer from below table.
Here table has record only when customers previous balance is changed.
Input dataset | ||
cust_id | date | balance |
A | 01/09/2018 | 1000 |
A | 08/09/2018 | 500 |
A | 20/09/2018 | 2500 |
A | 27/09/2018 | 50000 |
B | 27/08/2018 | 1000 |
B | 15/09/2018 | 5000 |
one way is, I can create another intermediate data set to fill in between dates like below and can take average balance by simple query - sel cust_id, Avg(balance) from table 2 where date between '01sep2018'd and '30sep2018'd group by cust_id.
As customer size is in Billion and due to space constraint, i couldn't create such an intermediate step. can anyone help me to find out out average balance for each customer without intermediate step.
Intermediate dataset | ||
cust_id | date | balance |
A | 01/09/2018 | 1000 |
A | 02/09/2018 | 1000 |
A | 03/09/2018 | 1000 |
A | 04/09/2018 | 1000 |
A | 05/09/2018 | 1000 |
A | 06/09/2018 | 1000 |
A | 07/09/2018 | 1000 |
A | 08/09/2018 | 500 |
A | 09/09/2018 | 500 |
A | 10/09/2018 | 500 |
A | 11/09/2018 | 500 |
A | 12/09/2018 | 500 |
A | 13/09/2018 | 500 |
A | 14/09/2018 | 500 |
A | 15/09/2018 | 500 |
A | 16/09/2018 | 500 |
A | 17/09/2018 | 500 |
A | 18/09/2018 | 500 |
A | 19/09/2018 | 500 |
A | 20/09/2018 | 2500 |
A | 21/09/2018 | 2500 |
A | 22/09/2018 | 2500 |
A | 23/09/2018 | 2500 |
A | 24/09/2018 | 2500 |
A | 25/09/2018 | 2500 |
A | 26/09/2018 | 2500 |
A | 27/09/2018 | 50000 |
A | 28/09/2018 | 50000 |
A | 29/09/2018 | 50000 |
A | 30/09/2018 | 50000 |
Thanks in advance
Maybe something like this?
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,'01sep2018'd);
sum+days*balance;
end;
else do;
days='01oct2018'd-max(date,'01sep2018'd);
sum+days*balance;
agv_balance=sum/30;
output;
sum=0;
end;
drop next_date days sum;
run;
Which can be generalized to any period using macro variables:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
days=&end_date-max(date,&start_date);
sum+days*balance;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
if &start_date<=next_date<&end_date;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
if date<&end_date then do;
days=&end_date-max(date,&start_date);
sum+days*balance;
end;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
Post test data in the form of a datastep in future, not as text or attachments:
data have; informat date ddmmyy10.; input cust_id $ date balance; datalines; A 01/09/2018 1000 A 08/09/2018 500 A 20/09/2018 2500 A 27/09/2018 50000 ; run; data want (keep=cust_id average); set have; by cust_id; retain overall days lstdt lstamt; if first.cust_id then do; overall=balance; days=1; lstdt=date; lstamt=balance; end; else if last.cust_id then do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; do i=date+1 to intnx('month',date,0,"e"); overall=overall+balance; days=days+1; end; average=balance/days; output; end; else do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; lstdt=date; lstamt=balance; end; run;
Hi, Many thanks. It really helps.
On my example, customer B has different scenario. can you please help to including B's scenario as well.
I want to find out September month average balance which is 3000 for B.
data have;
informat date ddmmyy10.;
input cust_id $ date balance;
datalines;
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
I am not sure I follow. B only has one record in September, so it really doesn't matter how many days, the average will always be 5000 not 3000. I have coded it to do days anyways. Or perhaps there is some other logic not mentioned?
data have; informat date ddmmyy10.; input cust_id $ date balance; datalines; A 01/09/2018 1000 A 08/09/2018 500 A 20/09/2018 2500 A 27/09/2018 50000 B 27/08/2018 1000 B 15/09/2018 5000 ; run; data want (keep=cust_id average); set have (where=(month(date)=9)); by cust_id; retain overall days lstdt lstamt; if first.cust_id and last.cust_id then do; average=(balance*(date-intnx('month',date,0,"e")))/(date-intnx('month',date,0,"e")); output; end; else if first.cust_id then do; overall=balance; days=1; lstdt=date; lstamt=balance; end; else if last.cust_id then do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; do i=date+1 to intnx('month',date,0,"e"); overall=overall+balance; days=days+1; end; average=balance/days; output; end; else do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; lstdt=date; lstamt=balance; end; run;
@aranganayagi wrote:
Hi, Many thanks. It really helps.
On my example, customer B has different scenario. can you please help to including B's scenario as well.
I want to find out September month average balance which is 3000 for B.
data have;
informat date ddmmyy10.;
input cust_id $ date balance;
datalines;
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
I guess you want to carry over from a previous month, if such exists:
data have;
input cust_id :$1. date :ddmmyy10. balance;
format date ddmmyy10.;
cards;
A 01/09/2018 1000
A 08/09/2018 500
A 20/09/2018 2500
A 27/09/2018 50000
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
data int;
set have;
set
have (
firstobs=2
keep=cust_id date balance
rename=(cust_id=nextcust date=nextdate balance=nextbal)
)
have (obs=1 drop=_all_)
;
if nextcust = cust_id
then do;
if intck('month',date, nextdate) > 0
then do;
weight = intnx('month',date,0,'e') - date + 1;
weighted = balance * weight;
output;
date = intnx('month',date,0,'e');
do while (date < intnx('month',nextdate,-1,'e'));
weight = day(date);
weighted = balance * weight;
output;
date = intnx('month',date,1,'e');
end;
date = intnx('month',nextdate,0,'b');
end;
weight = nextdate - date;
weighted = balance * weight;
output;
end;
else do; /* "fill up" the last month */
weight = intnx('month',date,0,'e') - date + 1;
weighted = balance * weight;
output;
end;
drop nextcust nextdate nextbal;
run;
proc sql;
create table want as
select
cust_id,
year(date) as year,
month(date) as month,
sum(weighted) / sum(weight) as avg_balance
from int
where calculated year = year(today()) and calculated month = 9
group by cust_id, calculated year, calculated month;
quit;
Maybe something like this?
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,'01sep2018'd);
sum+days*balance;
end;
else do;
days='01oct2018'd-max(date,'01sep2018'd);
sum+days*balance;
agv_balance=sum/30;
output;
sum=0;
end;
drop next_date days sum;
run;
Which can be generalized to any period using macro variables:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
days=&end_date-max(date,&start_date);
sum+days*balance;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
if &start_date<=next_date<&end_date;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
if date<&end_date then do;
days=&end_date-max(date,&start_date);
sum+days*balance;
end;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.