BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
P_S_
Obsidian | Level 7

Hi,

I have an interesting problem that I need to solve and need some help with.

Let's say I have 2 account numbers (100 and 200) that spreads across several months. Account 100 goes from month 1 to 6 and account 200 goes from month 1 to 4. Each account has balance (bal) associated to it in a certain month. The dataset example is shown below.

data test;

input account $ months bal;

datalines;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

accountmonthsbal
1001870
10021740
10032610
10043480
10054350
10065220
20011000
20021500
20031700
20041200

Now for each account and for each balance, I need to spead the 'bal' amount based on the vector below.

VECTOR0.20.30.10.250.15

Note that the vector sums up to 1 (100%).

Example: for account 100, the month 1 bal, which is 870 has to be spread out by applying the vector..so it will be 0.2*870 = 174, 0.3*870=261, 0.1*870 = 87, 0.25*870 = 217.5 and 0.15*870 = 130.5. Please note that the sum has to equal to the bal for that particular month when we break it down (174+261+87+217.5+130.5 = 870).

This has to be repeaded for each month and for each loan.

The final dataset will look some thing like this (this is only for account 100 but would also need to do the same for account 200 in the dataset).

accountORIGINAL DATABALANCE SPREAD OUT ACCORDING THE THE VECTOR PARAMETER
100monthsbalbal_1bal_2bal_3bal_4bal_5bal_6TOTAL
1001870174174
10021740261348609
10032610875225221131
10043480217.51747836961870.5
10054350130.543526110448702740.5
100652200261652.5348130510443610.5
100700391.587043515663262.5
10080005221087.55222131.5
10090000652.513051957.5
1001000000783783

Notice that the total in 'TOTAL' column is equal to the total in 'bal' column (18270), which it should, since we are just spreading out the bal.

Any ideas/suggestions for solving this will be of great help. I appreciate all your inputs.

Thank you,

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account $ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
   input _type_ $ _name_ $ bal;
   datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
   run;
proc print;
  
run;
proc score data=test score=VECTOR out=scored;
   var bal;
   run;
proc print;
  
run;
proc transpose data=scored out=s2;
   by account months bal;
   var VBAL:;
   run;
data s2;
   set s2;
   by account months bal;
   if first.months then i=months;
   else i + 1;
  
run;
proc sort;
  
by account i bal;
   run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
   by account i;
   id months;
   var col1;
   run;
data want;
   merge test s3;
   by account months;
   total = sum(of nbal:);
  
run;
proc print;
  
by account;
   id account;
   sum bal total;
   run;

7-25-2014 6-09-14 AM.png

View solution in original post

13 REPLIES 13
jakarman
Barite | Level 11

You are mentioning array processing (within rows) I would think on using lag processing.
Are you incrementing month's? I mean counting above 12 if necessary then the number in bal-1 is the relative monthnumber. What is the maximum value?  

---->-- ja karman --<-----
P_S_
Obsidian | Level 7

Hi Jaap,

Yes we will need to increment the month. Like for account 100, in the intial dataset, it has month 1 - 6. The month 1 balance is 870. So we spread out this balance based on index values. Please note the vector has 5 values. So we will need to spread out month1 balance over 5 months (month 1 - month 5)

( month 1 balance = 174

month 2 balance = 261

month 3 balance = 87

month 4 balance = 217.5

and month 5 balance = 130.5 ..all of this is captured in variable bal_1)

now we need to do the same thing for month 2 balance which is 1740. Spread this balance over 5 months based on vector and capture in bal_2.

So for month 6 balance, which is 5220, we need to spread this out again over 5 months based on index. So that will be month 6, 7, 8, 9 and 10.

So, yes if needed the months has to be increased. For the prototype we can use, each month balance will be spread over 5 months (based on the vector).

Hope this makes sense.

Solution doesnt have to be array based. It could be using any other logic.

Thank you,

ballardw
Super User

Do you have IML licensed? This looks more like a matrix operation that would work with IML best.

P_S_
Obsidian | Level 7

Yes I have access to IML. but i have not used it before.

jakarman
Barite | Level 11

P.S. your question is making sense. I see some time-series analyses where the decomposition is a model to correct for delayed effects so the signals should become more clear and trustworthy. Ballard did a proposals to go IML. Some matrix processing is also possible in PROC DS2. The datastep SAS data is also possible.  I would like to know which direction it should go. Using UE all three options are there. ETS and OR are missing that is how it is although your solution could also be in that area.

Do you expect more data processing reporting using a SAS dataset or would it move in pure scientific using matrices without a table like sql awareness. Can you make a guess on that?

Your question would be nice question to give it a bite.

---->-- ja karman --<-----
P_S_
Obsidian | Level 7

Hi Jaap, This would be a processing using a SAS dataset. Thus a solution that uses data step components, arrays, macros etc would be an ideal solution. Thank you.

data_null__
Jade | Level 19

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account $ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
   input _type_ $ _name_ $ bal;
   datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
   run;
proc print;
  
run;
proc score data=test score=VECTOR out=scored;
   var bal;
   run;
proc print;
  
run;
proc transpose data=scored out=s2;
   by account months bal;
   var VBAL:;
   run;
data s2;
   set s2;
   by account months bal;
   if first.months then i=months;
   else i + 1;
  
run;
proc sort;
  
by account i bal;
   run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
   by account i;
   id months;
   var col1;
   run;
data want;
   merge test s3;
   by account months;
   total = sum(of nbal:);
  
run;
proc print;
  
by account;
   id account;
   sum bal total;
   run;

7-25-2014 6-09-14 AM.png
FriedEgg
SAS Employee

The more manual way:

data foo;

input account $ month bal;

cards;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

data bar;

array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);

array b[6,11] _temporary_;

do until (last.account);

   set foo;

   by account;

   b[month,11]=bal;

   do _i_=1 to 5;

      b[month,_i_+month-1]=bal*v[_i_];

   end;

end;

array x[6]   bal_1-bal_6;

do month=1 to 10;

   do _j_=1 to 6;

      x[_j_]=b[_j_,month];

   end;

   if month<=6 then bal=b[month,11];

   else call missing(bal);

   total=sum(of x

  • );
  •    output;

    end;

    call missing(of b

  • , of x
  • );
  • drop _:;

    run;

                

    accountmonthbalbal_1bal_2bal_3bal_4bal_5bal_6total
    100 1 870174.0  .  .   .  .   . 174
    100 21740261.0348  .   .  .   . 609
    100 32610 87.0522522.0  .  .   .1131
    100 43480217.5174783.0 696  .   .1870.5
    100 54350130.5435261.01044 870.0  .2740.5
    100 65220  . 261652.5 3481305.010443610.5
    100 7.  .   .391.5 870 435.015663262.5
    100 8.  .   .  . 5221087.5 5222131.5
    100 9.  .   .  .   . 652.513051957.5
    10010.  .   .  .   .  . 783 783
    200 11000200.0  .  .   .  .   . 200
    200 21500300.0300  .   .  .   . 600
    200 31700100.0450340.0  .  .   . 890
    200 41200250.0150510.0 240  .   .1150
    200 5  .150.0375170.0 360  .   .1055
    200 6  .  . 225425.0 120  .   . 770
    200 7  .  .   .255.0 300  .   . 555
    200 8  .  .   .  . 180  .   . 180
    200 9  .  .   .  .   .  .   .  .
    20010  .  .   .  .   .  .   .  .
    data_null__
    Jade | Level 19

    You know me, I never like to pass up an opportunity to use PROC SCORE and especially PROC TRANSPOSE.:smileymischief:

    jakarman
    Barite | Level 11

    data foo; /* test data */
    input account $ month bal;
    cards;
    100 1 870
    100 2 1740
    100 3 2610
    100 4 3480
    100 5 4350
    100 6 5220
    200 1 1000
    200 2 1500
    200 3 1700
    200 4 1200
    ;
    run;

    /* Remarks:  */

    /* 1/ the intermediate lagged results are ordered a bit different. The same values can be found looking diagonal  */

    /* the ballg-n columns are having the meaning of a calculated value of that previous n-th period                            */

    /* 2/ instead of the dow loop approach (retaining) and full matrix all is row oriented where the lag is used getting previous values */
    /* 3/ coding the calculation as a routing it can be called twice or more.    Filling gaps in periods is possible */ 

    data bar; /* assumption oredered by account  month , month no stepped intervals */
       array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);
       array b ballg1-ballg5 ;
       set foo ;
       by account;
      
       link calculate ;
       if last.account then do _i_=1 to 4;  /* roll out queues to be empty additional 5-1 writes */
         bal = . ; month = month + 1 ;
         link calculate ;
       end;
       return; 
     
    calculate:  /* the lag function syntax is requiring a macrotized approach */
       b[1}=bal*v[1] ;
       b[2] =lag1(bal)*v[2];
       b[3] =lag2(bal)*v[3];
       b[4] =lag3(bal)*v[4];
       b[5] =lag4(bal)*v[5];
       total=sum(of ballg:) ;
       output;
     
    return;
    run;

    ---->-- ja karman --<-----
    jdmarino
    Fluorite | Level 6

    For the record: I'm impressed with the quality of the help you have received.  Props to , @data_null_; , .

    Ksharp
    Super User

    How about :

    data foo;
    input account $ month bal;
    cards;
    100 1 870
    100 2 1740
    100 3 2610
    100 4 3480
    100 5 4350
    100 6 5220
    200 1 1000
    200 2 1500
    200 3 1700
    200 4 1200
    ;
    run;
     
     
    data bar(drop=bal i);
     set foo;
     by  account ;
     array v{5}   _temporary_ (0.2 0.3 0.1 0.25 0.15);
     do i=1 to month-1 ;
      b=.;output;
     end;
     do i=1 to 5;
      b=bal*v{i};output;
     end;
    run;
    proc sql;
     select distinct catt('bar(where=(account="',account,'" and month=',month,') rename=(b=b_',month,'))') into : list separated by ' ' from foo;
    quit;
    data want(drop=month);
     merge &list ;
     by account;
     output;
     call missing(of _all_);
    run;
    
    

    Xia Keshan

    P_S_
    Obsidian | Level 7

    Thank you all for your great help, ideas and inputs. This is a great user community.

    hackathon24-white-horiz.png

    The 2025 SAS Hackathon has begun!

    It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

    Latest Updates

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 13 replies
    • 4490 views
    • 7 likes
    • 7 in conversation