Hi,
I have an interesting problem that I need to solve and need some help with.
Let's say I have 2 account numbers (100 and 200) that spreads across several months. Account 100 goes from month 1 to 6 and account 200 goes from month 1 to 4. Each account has balance (bal) associated to it in a certain month. The dataset example is shown below.
data test;
input account $ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;
account | months | bal |
100 | 1 | 870 |
100 | 2 | 1740 |
100 | 3 | 2610 |
100 | 4 | 3480 |
100 | 5 | 4350 |
100 | 6 | 5220 |
200 | 1 | 1000 |
200 | 2 | 1500 |
200 | 3 | 1700 |
200 | 4 | 1200 |
Now for each account and for each balance, I need to spead the 'bal' amount based on the vector below.
VECTOR | 0.2 | 0.3 | 0.1 | 0.25 | 0.15 |
Note that the vector sums up to 1 (100%).
Example: for account 100, the month 1 bal, which is 870 has to be spread out by applying the vector..so it will be 0.2*870 = 174, 0.3*870=261, 0.1*870 = 87, 0.25*870 = 217.5 and 0.15*870 = 130.5. Please note that the sum has to equal to the bal for that particular month when we break it down (174+261+87+217.5+130.5 = 870).
This has to be repeaded for each month and for each loan.
The final dataset will look some thing like this (this is only for account 100 but would also need to do the same for account 200 in the dataset).
account | ORIGINAL DATA | BALANCE SPREAD OUT ACCORDING THE THE VECTOR PARAMETER | |||||||
100 | months | bal | bal_1 | bal_2 | bal_3 | bal_4 | bal_5 | bal_6 | TOTAL |
100 | 1 | 870 | 174 | 174 | |||||
100 | 2 | 1740 | 261 | 348 | 609 | ||||
100 | 3 | 2610 | 87 | 522 | 522 | 1131 | |||
100 | 4 | 3480 | 217.5 | 174 | 783 | 696 | 1870.5 | ||
100 | 5 | 4350 | 130.5 | 435 | 261 | 1044 | 870 | 2740.5 | |
100 | 6 | 5220 | 0 | 261 | 652.5 | 348 | 1305 | 1044 | 3610.5 |
100 | 7 | 0 | 0 | 391.5 | 870 | 435 | 1566 | 3262.5 | |
100 | 8 | 0 | 0 | 0 | 522 | 1087.5 | 522 | 2131.5 | |
100 | 9 | 0 | 0 | 0 | 0 | 652.5 | 1305 | 1957.5 | |
100 | 10 | 0 | 0 | 0 | 0 | 0 | 783 | 783 |
Notice that the total in 'TOTAL' column is equal to the total in 'bal' column (18270), which it should, since we are just spreading out the bal.
Any ideas/suggestions for solving this will be of great help. I appreciate all your inputs.
Thank you,
This method does a bit of flipin' and floppin' but otherwise it is mostly hands free. You should be able to change VECTOR the scoring data and not have to touch the rest of the code. Assuming you want to change the scoring data. You can decide if it suites you.
You are mentioning array processing (within rows) I would think on using lag processing.
Are you incrementing month's? I mean counting above 12 if necessary then the number in bal-1 is the relative monthnumber. What is the maximum value?
Hi Jaap,
Yes we will need to increment the month. Like for account 100, in the intial dataset, it has month 1 - 6. The month 1 balance is 870. So we spread out this balance based on index values. Please note the vector has 5 values. So we will need to spread out month1 balance over 5 months (month 1 - month 5)
( month 1 balance = 174
month 2 balance = 261
month 3 balance = 87
month 4 balance = 217.5
and month 5 balance = 130.5 ..all of this is captured in variable bal_1)
now we need to do the same thing for month 2 balance which is 1740. Spread this balance over 5 months based on vector and capture in bal_2.
So for month 6 balance, which is 5220, we need to spread this out again over 5 months based on index. So that will be month 6, 7, 8, 9 and 10.
So, yes if needed the months has to be increased. For the prototype we can use, each month balance will be spread over 5 months (based on the vector).
Hope this makes sense.
Solution doesnt have to be array based. It could be using any other logic.
Thank you,
Do you have IML licensed? This looks more like a matrix operation that would work with IML best.
Yes I have access to IML. but i have not used it before.
P.S. your question is making sense. I see some time-series analyses where the decomposition is a model to correct for delayed effects so the signals should become more clear and trustworthy. Ballard did a proposals to go IML. Some matrix processing is also possible in PROC DS2. The datastep SAS data is also possible. I would like to know which direction it should go. Using UE all three options are there. ETS and OR are missing that is how it is although your solution could also be in that area.
Do you expect more data processing reporting using a SAS dataset or would it move in pure scientific using matrices without a table like sql awareness. Can you make a guess on that?
Your question would be nice question to give it a bite.
Hi Jaap, This would be a processing using a SAS dataset. Thus a solution that uses data step components, arrays, macros etc would be an ideal solution. Thank you.
This method does a bit of flipin' and floppin' but otherwise it is mostly hands free. You should be able to change VECTOR the scoring data and not have to touch the rest of the code. Assuming you want to change the scoring data. You can decide if it suites you.
The more manual way:
data foo;
input account $ month bal;
cards;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;
data bar;
array v[5] _temporary_ (0.2 0.3 0.1 0.25 0.15);
array b[6,11] _temporary_;
do until (last.account);
set foo;
by account;
b[month,11]=bal;
do _i_=1 to 5;
b[month,_i_+month-1]=bal*v[_i_];
end;
end;
array x[6] bal_1-bal_6;
do month=1 to 10;
do _j_=1 to 6;
x[_j_]=b[_j_,month];
end;
if month<=6 then bal=b[month,11];
else call missing(bal);
total=sum(of x
output;
end;
call missing(of b
drop _:;
run;
account | month | bal | bal_1 | bal_2 | bal_3 | bal_4 | bal_5 | bal_6 | total |
100 | 1 | 870 | 174.0 | . | . | . | . | . | 174 |
100 | 2 | 1740 | 261.0 | 348 | . | . | . | . | 609 |
100 | 3 | 2610 | 87.0 | 522 | 522.0 | . | . | . | 1131 |
100 | 4 | 3480 | 217.5 | 174 | 783.0 | 696 | . | . | 1870.5 |
100 | 5 | 4350 | 130.5 | 435 | 261.0 | 1044 | 870.0 | . | 2740.5 |
100 | 6 | 5220 | . | 261 | 652.5 | 348 | 1305.0 | 1044 | 3610.5 |
100 | 7 | . | . | . | 391.5 | 870 | 435.0 | 1566 | 3262.5 |
100 | 8 | . | . | . | . | 522 | 1087.5 | 522 | 2131.5 |
100 | 9 | . | . | . | . | . | 652.5 | 1305 | 1957.5 |
100 | 10 | . | . | . | . | . | . | 783 | 783 |
200 | 1 | 1000 | 200.0 | . | . | . | . | . | 200 |
200 | 2 | 1500 | 300.0 | 300 | . | . | . | . | 600 |
200 | 3 | 1700 | 100.0 | 450 | 340.0 | . | . | . | 890 |
200 | 4 | 1200 | 250.0 | 150 | 510.0 | 240 | . | . | 1150 |
200 | 5 | . | 150.0 | 375 | 170.0 | 360 | . | . | 1055 |
200 | 6 | . | . | 225 | 425.0 | 120 | . | . | 770 |
200 | 7 | . | . | . | 255.0 | 300 | . | . | 555 |
200 | 8 | . | . | . | . | 180 | . | . | 180 |
200 | 9 | . | . | . | . | . | . | . | . |
200 | 10 | . | . | . | . | . | . | . | . |
You know me, I never like to pass up an opportunity to use PROC SCORE and especially PROC TRANSPOSE.:smileymischief:
data foo; /* test data */
input account $ month bal;
cards;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;
/* Remarks: */
/* 1/ the intermediate lagged results are ordered a bit different. The same values can be found looking diagonal */
/* the ballg-n columns are having the meaning of a calculated value of that previous n-th period */
/* 2/ instead of the dow loop approach (retaining) and full matrix all is row oriented where the lag is used getting previous values */
/* 3/ coding the calculation as a routing it can be called twice or more. Filling gaps in periods is possible */
data bar; /* assumption oredered by account month , month no stepped intervals */
array v[5] _temporary_ (0.2 0.3 0.1 0.25 0.15);
array b ballg1-ballg5 ;
set foo ;
by account;
link calculate ;
if last.account then do _i_=1 to 4; /* roll out queues to be empty additional 5-1 writes */
bal = . ; month = month + 1 ;
link calculate ;
end;
return;
calculate: /* the lag function syntax is requiring a macrotized approach */
b[1}=bal*v[1] ;
b[2] =lag1(bal)*v[2];
b[3] =lag2(bal)*v[3];
b[4] =lag3(bal)*v[4];
b[5] =lag4(bal)*v[5];
total=sum(of ballg:) ;
output;
return;
run;
How about :
data foo; input account $ month bal; cards; 100 1 870 100 2 1740 100 3 2610 100 4 3480 100 5 4350 100 6 5220 200 1 1000 200 2 1500 200 3 1700 200 4 1200 ; run; data bar(drop=bal i); set foo; by account ; array v{5} _temporary_ (0.2 0.3 0.1 0.25 0.15); do i=1 to month-1 ; b=.;output; end; do i=1 to 5; b=bal*v{i};output; end; run; proc sql; select distinct catt('bar(where=(account="',account,'" and month=',month,') rename=(b=b_',month,'))') into : list separated by ' ' from foo; quit; data want(drop=month); merge &list ; by account; output; call missing(of _all_); run;
Xia Keshan
Thank you all for your great help, ideas and inputs. This is a great user community.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.