BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aranganayagi
Obsidian | Level 7

HI, I want to find out September monthly average balance of each customer from below table.

Here table has record only when customers previous balance is changed.

 

 

Input dataset 
cust_iddatebalance
A01/09/20181000
A08/09/2018500
A20/09/20182500
A27/09/201850000
B27/08/20181000
B15/09/20185000

one way is, I can create another intermediate data set to fill in between dates like below and can take average balance by simple query - sel cust_id, Avg(balance) from table 2 where date between '01sep2018'd and '30sep2018'd group by cust_id.

 

As customer size is in Billion and due to space constraint, i couldn't create such an intermediate step. can anyone help me to find out out average balance for each customer without intermediate step. 

Intermediate dataset
cust_iddatebalance
A01/09/20181000
A02/09/20181000
A03/09/20181000
A04/09/20181000
A05/09/20181000
A06/09/20181000
A07/09/20181000
A08/09/2018500
A09/09/2018500
A10/09/2018500
A11/09/2018500
A12/09/2018500
A13/09/2018500
A14/09/2018500
A15/09/2018500
A16/09/2018500
A17/09/2018500
A18/09/2018500
A19/09/2018500
A20/09/20182500
A21/09/20182500
A22/09/20182500
A23/09/20182500
A24/09/20182500
A25/09/20182500
A26/09/20182500
A27/09/201850000
A28/09/201850000
A29/09/201850000
A30/09/201850000

 

 

 

 

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Maybe something like this?

data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    days=next_date-max(date,'01sep2018'd);
    sum+days*balance;
    end;
  else do;
    days='01oct2018'd-max(date,'01sep2018'd);
    sum+days*balance;
    agv_balance=sum/30;
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

Which can be generalized to any period using macro variables:

%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    days=next_date-max(date,&start_date);
    sum+days*balance;
    end;
  else do;
    days=&end_date-max(date,&start_date);
    sum+days*balance;
    agv_balance=sum/(&end_date-&start_date);
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:

%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    if &start_date<=next_date<&end_date;
    days=next_date-max(date,&start_date);
    sum+days*balance;
    end;
  else do;
    if date<&end_date then do;
      days=&end_date-max(date,&start_date);
      sum+days*balance;
      end; 
    agv_balance=sum/(&end_date-&start_date);
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

 

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep in future, not as text or attachments:

data have;
  informat date ddmmyy10.;
  input cust_id $ date balance;
datalines;
A   01/09/2018   1000
A   08/09/2018   500
A   20/09/2018   2500
A   27/09/2018   50000
;
run;

data want (keep=cust_id average);
  set have;
  by cust_id;
  retain overall days lstdt lstamt;
  if first.cust_id then do;
    overall=balance;
    days=1;
    lstdt=date;
    lstamt=balance;
  end;
  else if last.cust_id then do;
    do i=lstdt+1 to date;
      overall=overall+lstamt;
      days=days+1;
    end;
    do i=date+1 to intnx('month',date,0,"e");
      overall=overall+balance;
      days=days+1;
    end;
    average=balance/days;
    output;
  end;
  else do;
    do i=lstdt+1 to date;
      overall=overall+lstamt;
      days=days+1;
    end;
    lstdt=date;
    lstamt=balance;
  end;
run;
aranganayagi
Obsidian | Level 7

Hi, Many thanks. It really helps. 

 

On my example, customer B has different scenario.  can you please help to including B's scenario as well.

I want to find out September month average balance which is 3000 for B.

data have;

informat date ddmmyy10.;

input cust_id $ date balance;

datalines;

B 27/08/2018 1000

B 15/09/2018 5000

;

run;

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I am not sure I follow.  B only has one record in September, so it really doesn't matter how many days, the average will always be 5000 not 3000.  I have coded it to do days anyways.  Or perhaps there is some other logic not mentioned?

data have;
  informat date ddmmyy10.;
  input cust_id $ date balance;
datalines;
A   01/09/2018   1000
A   08/09/2018   500
A   20/09/2018   2500
A   27/09/2018   50000
B   27/08/2018   1000
B   15/09/2018   5000
;
run;

data want (keep=cust_id average);
  set have (where=(month(date)=9));
  by cust_id;
  retain overall days lstdt lstamt;
  if first.cust_id and last.cust_id then do;
    average=(balance*(date-intnx('month',date,0,"e")))/(date-intnx('month',date,0,"e"));
    output;
  end;
  else if first.cust_id then do;
    overall=balance;
    days=1;
    lstdt=date;
    lstamt=balance;
  end;
  else if last.cust_id then do;
    do i=lstdt+1 to date;
      overall=overall+lstamt;
      days=days+1;
    end;
    do i=date+1 to intnx('month',date,0,"e");
      overall=overall+balance;
      days=days+1;
    end;
    average=balance/days;
    output;
  end;
  else do;
    do i=lstdt+1 to date;
      overall=overall+lstamt;
      days=days+1;
    end;
    lstdt=date;
    lstamt=balance;
  end;
run;
Kurt_Bremser
Super User

@aranganayagi wrote:

Hi, Many thanks. It really helps. 

 

On my example, customer B has different scenario.  can you please help to including B's scenario as well.

I want to find out September month average balance which is 3000 for B.

data have;

informat date ddmmyy10.;

input cust_id $ date balance;

datalines;

B 27/08/2018 1000

B 15/09/2018 5000

;

run;

 

 


I guess you want to carry over from a previous month, if such exists:

data have;
input cust_id :$1. date :ddmmyy10. balance;
format date ddmmyy10.;
cards;
A 01/09/2018 1000
A 08/09/2018 500
A 20/09/2018 2500
A 27/09/2018 50000
B 27/08/2018 1000
B 15/09/2018 5000
;
run;

data int;
set have;
set
  have (
    firstobs=2
    keep=cust_id date balance
    rename=(cust_id=nextcust date=nextdate balance=nextbal)
  )
  have (obs=1 drop=_all_)
;
if nextcust = cust_id
then do;
  if intck('month',date, nextdate) > 0
  then do;
    weight = intnx('month',date,0,'e') - date + 1;
    weighted = balance * weight;
    output;
    date = intnx('month',date,0,'e');
    do while (date < intnx('month',nextdate,-1,'e'));
      weight = day(date);
      weighted = balance * weight;
      output;
      date = intnx('month',date,1,'e');
    end;
    date = intnx('month',nextdate,0,'b');
  end;
  weight = nextdate - date;
  weighted = balance * weight;
  output;
end;
else do; /* "fill up" the last month */
  weight = intnx('month',date,0,'e') - date + 1;
  weighted = balance * weight;
  output;
end;
drop nextcust nextdate nextbal;
run; 

proc sql;
create table want as
select
  cust_id,
  year(date) as year,
  month(date) as month,
  sum(weighted) / sum(weight) as avg_balance
from int
where calculated year = year(today()) and calculated month = 9
group by cust_id, calculated year, calculated month;
quit;
s_lassen
Meteorite | Level 14

Maybe something like this?

data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    days=next_date-max(date,'01sep2018'd);
    sum+days*balance;
    end;
  else do;
    days='01oct2018'd-max(date,'01sep2018'd);
    sum+days*balance;
    agv_balance=sum/30;
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

Which can be generalized to any period using macro variables:

%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    days=next_date-max(date,&start_date);
    sum+days*balance;
    end;
  else do;
    days=&end_date-max(date,&start_date);
    sum+days*balance;
    agv_balance=sum/(&end_date-&start_date);
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

But you may have a problem if your data has more dates than necessary (e.g. 2 dates before 01sep2018, or dates after september). Here is a way to solve that:

%let start_date='01sep2018'd;
%let end_date='01oct2018'd; /* up to, but not including */
data want;
  set have;
  by cust_id;
  if not last.cust_id then do;
    _N_=_N_+1;
    set have(keep=date rename=(date=next_date)) point=_N_;
    if &start_date<=next_date<&end_date;
    days=next_date-max(date,&start_date);
    sum+days*balance;
    end;
  else do;
    if date<&end_date then do;
      days=&end_date-max(date,&start_date);
      sum+days*balance;
      end; 
    agv_balance=sum/(&end_date-&start_date);
    output;
    sum=0;
    end;
  drop next_date days sum;
run;

 

aranganayagi
Obsidian | Level 7
Thank you so much for all the replies. It solves the problem.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1712 views
  • 0 likes
  • 4 in conversation