DATA Step, Macro, Functions and more

Data Step / Arrays help needed !!!

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

Data Step / Arrays help needed !!!

Hi,

I have an interesting problem that I need to solve and need some help with.

Let's say I have 2 account numbers (100 and 200) that spreads across several months. Account 100 goes from month 1 to 6 and account 200 goes from month 1 to 4. Each account has balance (bal) associated to it in a certain month. The dataset example is shown below.

data test;

input account $ months bal;

datalines;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

accountmonthsbal
1001870
10021740
10032610
10043480
10054350
10065220
20011000
20021500
20031700
20041200

Now for each account and for each balance, I need to spead the 'bal' amount based on the vector below.

VECTOR0.20.30.10.250.15

Note that the vector sums up to 1 (100%).

Example: for account 100, the month 1 bal, which is 870 has to be spread out by applying the vector..so it will be 0.2*870 = 174, 0.3*870=261, 0.1*870 = 87, 0.25*870 = 217.5 and 0.15*870 = 130.5. Please note that the sum has to equal to the bal for that particular month when we break it down (174+261+87+217.5+130.5 = 870).

This has to be repeaded for each month and for each loan.

The final dataset will look some thing like this (this is only for account 100 but would also need to do the same for account 200 in the dataset).

accountORIGINAL DATABALANCE SPREAD OUT ACCORDING THE THE VECTOR PARAMETER
100monthsbalbal_1bal_2bal_3bal_4bal_5bal_6TOTAL
1001870174174
10021740261348609
10032610875225221131
10043480217.51747836961870.5
10054350130.543526110448702740.5
100652200261652.5348130510443610.5
100700391.587043515663262.5
10080005221087.55222131.5
10090000652.513051957.5
1001000000783783

Notice that the total in 'TOTAL' column is equal to the total in 'bal' column (18270), which it should, since we are just spreading out the bal.

Any ideas/suggestions for solving this will be of great help. I appreciate all your inputs.

Thank you,


Accepted Solutions
Solution
‎07-24-2014 04:19 PM
Respected Advisor
Posts: 3,777

Re: Data Step / Arrays help needed !!!

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account $ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
   input _type_ $ _name_ $ bal;
   datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
   run;
proc print;
  
run;
proc score data=test score=VECTOR out=scored;
   var bal;
   run;
proc print;
  
run;
proc transpose data=scored out=s2;
   by account months bal;
   var VBAL:;
   run;
data s2;
   set s2;
   by account months bal;
   if first.months then i=months;
   else i + 1;
  
run;
proc sort;
  
by account i bal;
   run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
   by account i;
   id months;
   var col1;
   run;
data want;
   merge test s3;
   by account months;
   total = sum(of nbalSmiley Happy;
  
run;
proc print;
  
by account;
   id account;
   sum bal total;
   run;

7-25-2014 6-09-14 AM.png

View solution in original post


All Replies
Valued Guide
Posts: 3,208

Re: Data Step / Arrays help needed !!!

You are mentioning array processing (within rows) I would think on using lag processing.
Are you incrementing month's? I mean counting above 12 if necessary then the number in bal-1 is the relative monthnumber. What is the maximum value?  

---->-- ja karman --<-----
Contributor
Posts: 42

Re: Data Step / Arrays help needed !!!

Hi Jaap,

Yes we will need to increment the month. Like for account 100, in the intial dataset, it has month 1 - 6. The month 1 balance is 870. So we spread out this balance based on index values. Please note the vector has 5 values. So we will need to spread out month1 balance over 5 months (month 1 - month 5)

( month 1 balance = 174

month 2 balance = 261

month 3 balance = 87

month 4 balance = 217.5

and month 5 balance = 130.5 ..all of this is captured in variable bal_1)

now we need to do the same thing for month 2 balance which is 1740. Spread this balance over 5 months based on vector and capture in bal_2.

So for month 6 balance, which is 5220, we need to spread this out again over 5 months based on index. So that will be month 6, 7, 8, 9 and 10.

So, yes if needed the months has to be increased. For the prototype we can use, each month balance will be spread over 5 months (based on the vector).

Hope this makes sense.

Solution doesnt have to be array based. It could be using any other logic.

Thank you,

Super User
Posts: 10,550

Re: Data Step / Arrays help needed !!!

Do you have IML licensed? This looks more like a matrix operation that would work with IML best.

Contributor
Posts: 42

Re: Data Step / Arrays help needed !!!

Yes I have access to IML. but i have not used it before.

Valued Guide
Posts: 3,208

Re: Data Step / Arrays help needed !!!

P.S. your question is making sense. I see some time-series analyses where the decomposition is a model to correct for delayed effects so the signals should become more clear and trustworthy. Ballard did a proposals to go IML. Some matrix processing is also possible in PROC DS2. The datastep SAS data is also possible.  I would like to know which direction it should go. Using UE all three options are there. ETS and OR are missing that is how it is although your solution could also be in that area.

Do you expect more data processing reporting using a SAS dataset or would it move in pure scientific using matrices without a table like sql awareness. Can you make a guess on that?

Your question would be nice question to give it a bite.

---->-- ja karman --<-----
Contributor
Posts: 42

Re: Data Step / Arrays help needed !!!

Hi Jaap, This would be a processing using a SAS dataset. Thus a solution that uses data step components, arrays, macros etc would be an ideal solution. Thank you.

Solution
‎07-24-2014 04:19 PM
Respected Advisor
Posts: 3,777

Re: Data Step / Arrays help needed !!!

This method does a bit of flipin' and floppin' but otherwise it is mostly hands free.  You should be able to change VECTOR the scoring data and not have to touch the rest of the code.  Assuming you want to change the scoring data.  You can decide if it suites you.

data test;
input account $ months bal;
datalines;
100 1 870
100 2 1740
100 3 2610
100 4 3480
100 5 4350
100 6 5220
200 1 1000
200 2 1500
200 3 1700
200 4 1200
;
run;

data VECTOR;
   input _type_ $ _name_ $ bal;
   datalines;
SCORE  VBAL1     0.2
SCORE  VBAL2     0.3
SCORE  VBAL3     0.1
SCORE  VBAL4     0.25
SCORE  VBAL5     0.15
;;;;
   run;
proc print;
  
run;
proc score data=test score=VECTOR out=scored;
   var bal;
   run;
proc print;
  
run;
proc transpose data=scored out=s2;
   by account months bal;
   var VBAL:;
   run;
data s2;
   set s2;
   by account months bal;
   if first.months then i=months;
   else i + 1;
  
run;
proc sort;
  
by account i bal;
   run;
proc transpose data=s2 out=s3(rename=(i=months) drop=_name_) prefix=nbal;
   by account i;
   id months;
   var col1;
   run;
data want;
   merge test s3;
   by account months;
   total = sum(of nbalSmiley Happy;
  
run;
proc print;
  
by account;
   id account;
   sum bal total;
   run;

7-25-2014 6-09-14 AM.png
Trusted Advisor
Posts: 1,300

Re: Data Step / Arrays help needed !!!

The more manual way:

data foo;

input account $ month bal;

cards;

100 1 870

100 2 1740

100 3 2610

100 4 3480

100 5 4350

100 6 5220

200 1 1000

200 2 1500

200 3 1700

200 4 1200

;

run;

data bar;

array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);

array b[6,11] _temporary_;

do until (last.account);

   set foo;

   by account;

   b[month,11]=bal;

   do _i_=1 to 5;

      b[month,_i_+month-1]=bal*v[_i_];

   end;

end;

array x[6]   bal_1-bal_6;

do month=1 to 10;

   do _j_=1 to 6;

      x[_j_]=b[_j_,month];

   end;

   if month<=6 then bal=b[month,11];

   else call missing(bal);

   total=sum(of x

  • );
  •    output;

    end;

    call missing(of b

  • , of x
  • );
  • drop _:;

    run;

                

    accountmonthbalbal_1bal_2bal_3bal_4bal_5bal_6total
    100 1 870174.0  .  .   .  .   . 174
    100 21740261.0348  .   .  .   . 609
    100 32610 87.0522522.0  .  .   .1131
    100 43480217.5174783.0 696  .   .1870.5
    100 54350130.5435261.01044 870.0  .2740.5
    100 65220  . 261652.5 3481305.010443610.5
    100 7.  .   .391.5 870 435.015663262.5
    100 8.  .   .  . 5221087.5 5222131.5
    100 9.  .   .  .   . 652.513051957.5
    10010.  .   .  .   .  . 783 783
    200 11000200.0  .  .   .  .   . 200
    200 21500300.0300  .   .  .   . 600
    200 31700100.0450340.0  .  .   . 890
    200 41200250.0150510.0 240  .   .1150
    200 5  .150.0375170.0 360  .   .1055
    200 6  .  . 225425.0 120  .   . 770
    200 7  .  .   .255.0 300  .   . 555
    200 8  .  .   .  . 180  .   . 180
    200 9  .  .   .  .   .  .   .  .
    20010  .  .   .  .   .  .   .  .
    Respected Advisor
    Posts: 3,777

    Re: Data Step / Arrays help needed !!!

    You know me, I never like to pass up an opportunity to use PROC SCORE and especially PROC TRANSPOSE.:smileymischief:

    Valued Guide
    Posts: 3,208

    Re: Data Step / Arrays help needed !!!

    data foo; /* test data */
    input account $ month bal;
    cards;
    100 1 870
    100 2 1740
    100 3 2610
    100 4 3480
    100 5 4350
    100 6 5220
    200 1 1000
    200 2 1500
    200 3 1700
    200 4 1200
    ;
    run;

    /* Remarks:  */

    /* 1/ the intermediate lagged results are ordered a bit different. The same values can be found looking diagonal  */

    /* the ballg-n columns are having the meaning of a calculated value of that previous n-th period                            */

    /* 2/ instead of the dow loop approach (retaining) and full matrix all is row oriented where the lag is used getting previous values */
    /* 3/ coding the calculation as a routing it can be called twice or more.    Filling gaps in periods is possible */ 

    data bar; /* assumption oredered by account  month , month no stepped intervals */
       array v[5]   _temporary_ (0.2 0.3 0.1 0.25 0.15);
       array b ballg1-ballg5 ;
       set foo ;
       by account;
      
       link calculate ;
       if last.account then do _i_=1 to 4;  /* roll out queues to be empty additional 5-1 writes */
         bal = . ; month = month + 1 ;
         link calculate ;
       end;
       return; 
     
    calculate:  /* the lag function syntax is requiring a macrotized approach */
       b[1}=bal*v[1] ;
       b[2] =lag1(bal)*v[2];
       b[3] =lag2(bal)*v[3];
       b[4] =lag3(bal)*v[4];
       b[5] =lag4(bal)*v[5];
       total=sum(of ballgSmiley Happy ;
       output;
     
    return;
    run;

    ---->-- ja karman --<-----
    Attachment
    Frequent Contributor
    Posts: 78

    Re: Data Step / Arrays help needed !!!

    For the record: I'm impressed with the quality of the help you have received.  Props to , @data_null_; , .

    Super User
    Posts: 9,691

    Re: Data Step / Arrays help needed !!!

    How about :

    data foo;
    input account $ month bal;
    cards;
    100 1 870
    100 2 1740
    100 3 2610
    100 4 3480
    100 5 4350
    100 6 5220
    200 1 1000
    200 2 1500
    200 3 1700
    200 4 1200
    ;
    run;
     
     
    data bar(drop=bal i);
     set foo;
     by  account ;
     array v{5}   _temporary_ (0.2 0.3 0.1 0.25 0.15);
     do i=1 to month-1 ;
      b=.;output;
     end;
     do i=1 to 5;
      b=bal*v{i};output;
     end;
    run;
    proc sql;
     select distinct catt('bar(where=(account="',account,'" and month=',month,') rename=(b=b_',month,'))') into : list separated by ' ' from foo;
    quit;
    data want(drop=month);
     merge &list ;
     by account;
     output;
     call missing(of _all_);
    run;
    
    

    Xia Keshan

    Contributor
    Posts: 42

    Re: Data Step / Arrays help needed !!!

    Thank you all for your great help, ideas and inputs. This is a great user community.

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 13 replies
    • 458 views
    • 7 likes
    • 7 in conversation