Solved
Contributor
Posts: 42

# Data Step / Arrays help needed !!!

Hi,

I have an interesting problem that I need to solve and need some help with.

Let's say I have 2 account numbers (100 and 200) that spreads across several months. Account 100 goes from month 1 to 6 and account 200 goes from month 1 to 4. Each account has balance (bal) associated to it in a certain month. The dataset example is shown below.

data test;

input account \$ months bal;

datalines;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

 account months bal 100 1 870 100 2 1740 100 3 2610 100 4 3480 100 5 4350 100 6 5220 200 1 1000 200 2 1500 200 3 1700 200 4 1200

Now for each account and for each balance, I need to spead the 'bal' amount based on the vector below.

 VECTOR 0.2 0.3 0.1 0.25 0.15

Note that the vector sums up to 1 (100%).

Example: for account 100, the month 1 bal, which is 870 has to be spread out by applying the vector..so it will be 0.2*870 = 174, 0.3*870=261, 0.1*870 = 87, 0.25*870 = 217.5 and 0.15*870 = 130.5. Please note that the sum has to equal to the bal for that particular month when we break it down (174+261+87+217.5+130.5 = 870).

This has to be repeaded for each month and for each loan.

The final dataset will look some thing like this (this is only for account 100 but would also need to do the same for account 200 in the dataset).

 account ORIGINAL DATA BALANCE SPREAD OUT ACCORDING THE THE VECTOR PARAMETER 100 months bal bal_1 bal_2 bal_3 bal_4 bal_5 bal_6 TOTAL 100 1 870 174 174 100 2 1740 261 348 609 100 3 2610 87 522 522 1131 100 4 3480 217.5 174 783 696 1870.5 100 5 4350 130.5 435 261 1044 870 2740.5 100 6 5220 0 261 652.5 348 1305 1044 3610.5 100 7 0 0 391.5 870 435 1566 3262.5 100 8 0 0 0 522 1087.5 522 2131.5 100 9 0 0 0 0 652.5 1305 1957.5 100 10 0 0 0 0 0 783 783

Notice that the total in 'TOTAL' column is equal to the total in 'bal' column (18270), which it should, since we are just spreading out the bal.

Any ideas/suggestions for solving this will be of great help. I appreciate all your inputs.

Thank you,

Accepted Solutions
Solution
‎07-24-2014 04:19 PM
Posts: 3,852

## Re: Data Step / Arrays help needed !!!

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account \$ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
input _type_ \$ _name_ \$ bal;
datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
run;
proc print;

run;
proc score data=test score=VECTOR out=scored;
var bal;
run;
proc print;

run;
proc transpose data=scored out=s2;
by account months bal;
var VBAL:;
run;
data s2;
set s2;
by account months bal;
if first.months then i=months;
else i + 1;

run;
proc sort;

by account i bal;
run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
by account i;
id months;
var col1;
run;
data want;
merge test s3;
by account months;
total = sum(of nbal;

run;
proc print;

by account;
id account;
sum bal total;
run;

All Replies
Posts: 3,215

## Re: Data Step / Arrays help needed !!!

You are mentioning array processing (within rows) I would think on using lag processing.
Are you incrementing month's? I mean counting above 12 if necessary then the number in bal-1 is the relative monthnumber. What is the maximum value?

---->-- ja karman --<-----
Contributor
Posts: 42

## Re: Data Step / Arrays help needed !!!

Hi Jaap,

Yes we will need to increment the month. Like for account 100, in the intial dataset, it has month 1 - 6. The month 1 balance is 870. So we spread out this balance based on index values. Please note the vector has 5 values. So we will need to spread out month1 balance over 5 months (month 1 - month 5)

( month 1 balance = 174

month 2 balance = 261

month 3 balance = 87

month 4 balance = 217.5

and month 5 balance = 130.5 ..all of this is captured in variable bal_1)

now we need to do the same thing for month 2 balance which is 1740. Spread this balance over 5 months based on vector and capture in bal_2.

So for month 6 balance, which is 5220, we need to spread this out again over 5 months based on index. So that will be month 6, 7, 8, 9 and 10.

So, yes if needed the months has to be increased. For the prototype we can use, each month balance will be spread over 5 months (based on the vector).

Hope this makes sense.

Solution doesnt have to be array based. It could be using any other logic.

Thank you,

Super User
Posts: 13,583

## Re: Data Step / Arrays help needed !!!

Do you have IML licensed? This looks more like a matrix operation that would work with IML best.

Contributor
Posts: 42

## Re: Data Step / Arrays help needed !!!

Yes I have access to IML. but i have not used it before.

Posts: 3,215

## Re: Data Step / Arrays help needed !!!

P.S. your question is making sense. I see some time-series analyses where the decomposition is a model to correct for delayed effects so the signals should become more clear and trustworthy. Ballard did a proposals to go IML. Some matrix processing is also possible in PROC DS2. The datastep SAS data is also possible.  I would like to know which direction it should go. Using UE all three options are there. ETS and OR are missing that is how it is although your solution could also be in that area.

Do you expect more data processing reporting using a SAS dataset or would it move in pure scientific using matrices without a table like sql awareness. Can you make a guess on that?

Your question would be nice question to give it a bite.

---->-- ja karman --<-----
Contributor
Posts: 42

## Re: Data Step / Arrays help needed !!!

Hi Jaap, This would be a processing using a SAS dataset. Thus a solution that uses data step components, arrays, macros etc would be an ideal solution. Thank you.

Solution
‎07-24-2014 04:19 PM
Posts: 3,852

## Re: Data Step / Arrays help needed !!!

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account \$ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
input _type_ \$ _name_ \$ bal;
datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
run;
proc print;

run;
proc score data=test score=VECTOR out=scored;
var bal;
run;
proc print;

run;
proc transpose data=scored out=s2;
by account months bal;
var VBAL:;
run;
data s2;
set s2;
by account months bal;
if first.months then i=months;
else i + 1;

run;
proc sort;

by account i bal;
run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
by account i;
id months;
var col1;
run;
data want;
merge test s3;
by account months;
total = sum(of nbal;

run;
proc print;

by account;
id account;
sum bal total;
run;

Posts: 1,318

## Re: Data Step / Arrays help needed !!!

Posted in reply to data_null__

The more manual way:

data foo;

input account \$ month bal;

cards;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

data bar;

array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);

array b[6,11] _temporary_;

do until (last.account);

set foo;

by account;

b[month,11]=bal;

do _i_=1 to 5;

b[month,_i_+month-1]=bal*v[_i_];

end;

end;

array x[6]   bal_1-bal_6;

do month=1 to 10;

do _j_=1 to 6;

x[_j_]=b[_j_,month];

end;

if month<=6 then bal=b[month,11];

else call missing(bal);

total=sum(of x

• );
•    output;

end;

call missing(of b

• , of x
• );
• drop _:;

run;

 account month bal bal_1 bal_2 bal_3 bal_4 bal_5 bal_6 total 100 1 870 174.0 . . . . . 174 100 2 1740 261.0 348 . . . . 609 100 3 2610 87.0 522 522.0 . . . 1131 100 4 3480 217.5 174 783.0 696 . . 1870.5 100 5 4350 130.5 435 261.0 1044 870.0 . 2740.5 100 6 5220 . 261 652.5 348 1305.0 1044 3610.5 100 7 . . . 391.5 870 435.0 1566 3262.5 100 8 . . . . 522 1087.5 522 2131.5 100 9 . . . . . 652.5 1305 1957.5 100 10 . . . . . . 783 783 200 1 1000 200.0 . . . . . 200 200 2 1500 300.0 300 . . . . 600 200 3 1700 100.0 450 340.0 . . . 890 200 4 1200 250.0 150 510.0 240 . . 1150 200 5 . 150.0 375 170.0 360 . . 1055 200 6 . . 225 425.0 120 . . 770 200 7 . . . 255.0 300 . . 555 200 8 . . . . 180 . . 180 200 9 . . . . . . . . 200 10 . . . . . . . .
Posts: 3,852

## Re: Data Step / Arrays help needed !!!

You know me, I never like to pass up an opportunity to use PROC SCORE and especially PROC TRANSPOSE.:smileymischief:

Posts: 3,215

## Re: Data Step / Arrays help needed !!!

data foo; /* test data */
input account \$ month bal;
cards;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

/* Remarks:  */

/* 1/ the intermediate lagged results are ordered a bit different. The same values can be found looking diagonal  */

/* the ballg-n columns are having the meaning of a calculated value of that previous n-th period                            */

/* 2/ instead of the dow loop approach (retaining) and full matrix all is row oriented where the lag is used getting previous values */
/* 3/ coding the calculation as a routing it can be called twice or more.    Filling gaps in periods is possible */

data bar; /* assumption oredered by account  month , month no stepped intervals */
array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);
array b ballg1-ballg5 ;
set foo ;
by account;

if last.account then do _i_=1 to 4;  /* roll out queues to be empty additional 5-1 writes */
bal = . ; month = month + 1 ;
end;
return;

calculate:  /* the lag function syntax is requiring a macrotized approach */
b[1}=bal*v[1] ;
b[2] =lag1(bal)*v[2];
b[3] =lag2(bal)*v[3];
b[4] =lag3(bal)*v[4];
b[5] =lag4(bal)*v[5];
total=sum(of ballg ;
output;

return;
run;

---->-- ja karman --<-----
Frequent Contributor
Posts: 78

## Re: Data Step / Arrays help needed !!!

For the record: I'm impressed with the quality of the help you have received.  Props to , @data_null_; , .

Super User
Posts: 10,787

## Re: Data Step / Arrays help needed !!!

```data foo;
input account \$ month bal;
cards;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data bar(drop=bal i);
set foo;
by  account ;
array v{5}   _temporary_ (0.2 0.3 0.1 0.25 0.15);
do i=1 to month-1 ;
b=.;output;
end;
do i=1 to 5;
b=bal*v{i};output;
end;
run;
proc sql;
select distinct catt('bar(where=(account="',account,'" and month=',month,') rename=(b=b_',month,'))') into : list separated by ' ' from foo;
quit;
data want(drop=month);
merge &list ;
by account;
output;
call missing(of _all_);
run;

```

Xia Keshan

Contributor
Posts: 42

## Re: Data Step / Arrays help needed !!!

Thank you all for your great help, ideas and inputs. This is a great user community.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 13 replies
• 508 views
• 7 likes
• 7 in conversation