HI, I want to find out September monthly average balance of each customer from below table.
Here table has record only when customers previous balance is changed.
Input dataset | ||
cust_id | date | balance |
A | 01/09/2018 | 1000 |
A | 08/09/2018 | 500 |
A | 20/09/2018 | 2500 |
A | 27/09/2018 | 50000 |
B | 27/08/2018 | 1000 |
B | 15/09/2018 | 5000 |
one way is, I can create another intermediate data set to fill in between dates like below and can take average balance by simple query - sel cust_id, Avg(balance) from table 2 where date between '01sep2018'd and '30sep2018'd group by cust_id.
As customer size is in Billion and due to space constraint, i couldn't create such an intermediate step. can anyone help me to find out out average balance for each customer without intermediate step.
Intermediate dataset | ||
cust_id | date | balance |
A | 01/09/2018 | 1000 |
A | 02/09/2018 | 1000 |
A | 03/09/2018 | 1000 |
A | 04/09/2018 | 1000 |
A | 05/09/2018 | 1000 |
A | 06/09/2018 | 1000 |
A | 07/09/2018 | 1000 |
A | 08/09/2018 | 500 |
A | 09/09/2018 | 500 |
A | 10/09/2018 | 500 |
A | 11/09/2018 | 500 |
A | 12/09/2018 | 500 |
A | 13/09/2018 | 500 |
A | 14/09/2018 | 500 |
A | 15/09/2018 | 500 |
A | 16/09/2018 | 500 |
A | 17/09/2018 | 500 |
A | 18/09/2018 | 500 |
A | 19/09/2018 | 500 |
A | 20/09/2018 | 2500 |
A | 21/09/2018 | 2500 |
A | 22/09/2018 | 2500 |
A | 23/09/2018 | 2500 |
A | 24/09/2018 | 2500 |
A | 25/09/2018 | 2500 |
A | 26/09/2018 | 2500 |
A | 27/09/2018 | 50000 |
A | 28/09/2018 | 50000 |
A | 29/09/2018 | 50000 |
A | 30/09/2018 | 50000 |
Thanks in advance
Maybe something like this?
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,'01sep2018'd);
sum+days*balance;
end;
else do;
days='01oct2018'd-max(date,'01sep2018'd);
sum+days*balance;
agv_balance=sum/30;
output;
sum=0;
end;
drop next_date days sum;
run;
Which can be generalized to any period using macro variables:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
days=&end_date-max(date,&start_date);
sum+days*balance;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
if &start_date<=next_date<&end_date;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
if date<&end_date then do;
days=&end_date-max(date,&start_date);
sum+days*balance;
end;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
Post test data in the form of a datastep in future, not as text or attachments:
data have; informat date ddmmyy10.; input cust_id $ date balance; datalines; A 01/09/2018 1000 A 08/09/2018 500 A 20/09/2018 2500 A 27/09/2018 50000 ; run; data want (keep=cust_id average); set have; by cust_id; retain overall days lstdt lstamt; if first.cust_id then do; overall=balance; days=1; lstdt=date; lstamt=balance; end; else if last.cust_id then do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; do i=date+1 to intnx('month',date,0,"e"); overall=overall+balance; days=days+1; end; average=balance/days; output; end; else do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; lstdt=date; lstamt=balance; end; run;
Hi, Many thanks. It really helps.
On my example, customer B has different scenario. can you please help to including B's scenario as well.
I want to find out September month average balance which is 3000 for B.
data have;
informat date ddmmyy10.;
input cust_id $ date balance;
datalines;
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
I am not sure I follow. B only has one record in September, so it really doesn't matter how many days, the average will always be 5000 not 3000. I have coded it to do days anyways. Or perhaps there is some other logic not mentioned?
data have; informat date ddmmyy10.; input cust_id $ date balance; datalines; A 01/09/2018 1000 A 08/09/2018 500 A 20/09/2018 2500 A 27/09/2018 50000 B 27/08/2018 1000 B 15/09/2018 5000 ; run; data want (keep=cust_id average); set have (where=(month(date)=9)); by cust_id; retain overall days lstdt lstamt; if first.cust_id and last.cust_id then do; average=(balance*(date-intnx('month',date,0,"e")))/(date-intnx('month',date,0,"e")); output; end; else if first.cust_id then do; overall=balance; days=1; lstdt=date; lstamt=balance; end; else if last.cust_id then do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; do i=date+1 to intnx('month',date,0,"e"); overall=overall+balance; days=days+1; end; average=balance/days; output; end; else do; do i=lstdt+1 to date; overall=overall+lstamt; days=days+1; end; lstdt=date; lstamt=balance; end; run;
@aranganayagi wrote:
Hi, Many thanks. It really helps.
On my example, customer B has different scenario. can you please help to including B's scenario as well.
I want to find out September month average balance which is 3000 for B.
data have;
informat date ddmmyy10.;
input cust_id $ date balance;
datalines;
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
I guess you want to carry over from a previous month, if such exists:
data have;
input cust_id :$1. date :ddmmyy10. balance;
format date ddmmyy10.;
cards;
A 01/09/2018 1000
A 08/09/2018 500
A 20/09/2018 2500
A 27/09/2018 50000
B 27/08/2018 1000
B 15/09/2018 5000
;
run;
data int;
set have;
set
have (
firstobs=2
keep=cust_id date balance
rename=(cust_id=nextcust date=nextdate balance=nextbal)
)
have (obs=1 drop=_all_)
;
if nextcust = cust_id
then do;
if intck('month',date, nextdate) > 0
then do;
weight = intnx('month',date,0,'e') - date + 1;
weighted = balance * weight;
output;
date = intnx('month',date,0,'e');
do while (date < intnx('month',nextdate,-1,'e'));
weight = day(date);
weighted = balance * weight;
output;
date = intnx('month',date,1,'e');
end;
date = intnx('month',nextdate,0,'b');
end;
weight = nextdate - date;
weighted = balance * weight;
output;
end;
else do; /* "fill up" the last month */
weight = intnx('month',date,0,'e') - date + 1;
weighted = balance * weight;
output;
end;
drop nextcust nextdate nextbal;
run;
proc sql;
create table want as
select
cust_id,
year(date) as year,
month(date) as month,
sum(weighted) / sum(weight) as avg_balance
from int
where calculated year = year(today()) and calculated month = 9
group by cust_id, calculated year, calculated month;
quit;
Maybe something like this?
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,'01sep2018'd);
sum+days*balance;
end;
else do;
days='01oct2018'd-max(date,'01sep2018'd);
sum+days*balance;
agv_balance=sum/30;
output;
sum=0;
end;
drop next_date days sum;
run;
Which can be generalized to any period using macro variables:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
days=&end_date-max(date,&start_date);
sum+days*balance;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:
%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
set have;
by cust_id;
if not last.cust_id then do;
_N_=_N_+1;
set have(keep=date rename=(date=next_date)) point=_N_;
if &start_date<=next_date<&end_date;
days=next_date-max(date,&start_date);
sum+days*balance;
end;
else do;
if date<&end_date then do;
days=&end_date-max(date,&start_date);
sum+days*balance;
end;
agv_balance=sum/(&end_date-&start_date);
output;
sum=0;
end;
drop next_date days sum;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.