Re: Working with large datasets by group

Report Inappropriate Content · Posted 03-07-2018 02:37 AM

I am working with a big dataset and as it stands, due to my recent introduction to sas as a language, I've been working by row as I'm unsure how to work across rows. I have seen a lot of people advising me to work by group, I hope that someone could give me some guidance on how I would do this with an example - That way I can mould and apply the result to all my work in this project:

Supplement: The big data set looks like:

Segment       2017        Payment 1      Payment 2      Payment 3
1                   90            30                    30                   30
2                   60            20                    30                   10
3                   15            5                      5                     5
4   etc.
5
6
7
8

The segments may be the same but then are sorted by asset class, fixed or floating and then performing or non-performing

The below table is contructed by Segment 1

Initial Volume (2017) = 90
Payment1 (2018) = 30
Payment2 (2019) = 30
Payment3 (2020) = 30

My table looks as follows:
*********************Table 1 ***************************
Payments are made on a (0.33,0.33,0.33) basis
17/17 is the initial amount outstanding.
***********************************************************
Year                 17        18        19        20
Vintage

17                     90         30       30        30

18                      0            0       10        10
  
19                      0            0        0          13.3

20                      0            0         0         0

**************************Assumption*******************

When the above instrument matures, it must be replaced so new issuance is equal to (Vintage/Year) (essentially summing down the rows into the diagonals)

18/18      19/19    20/20
30 ,       40       53.3

**************************Table 2************************
Total volume is equal to the maturing + new Issuance
************************************************************

Year                 17        18        19        20
Vintage

17                     90         60       30        0

18                      0          30       20        10
  
19                      0            0       40        26.67

20                      0            0         0        53.33


My issue is that the next row depends on the previous row, how do I work with big data in order to create these tables without binning the segments into individual asset classes and then by row creating the vintages 17--20

in this sense by group functions probably cannot complete this work?

I'm looking into arrays to do this as this would make sense but people keep telling me by group work is the best way computationally.

Kurt_Bremser · Posted 03-07-2018 03:06 AM

I have already given you an example in a previous question of yours for how to post example data in a data step. Please post example data in a data step with datalines so we can easily recreate it on our machines for testing.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Report Inappropriate Content · Posted 03-07-2018 03:30 AM

Hi Kurt, I apologise, I didn't see your message.

The data i'm working with is simply:

data have;

input Asset $ Segment $ fixedorfloating $ Performing $ Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;

datalines;

Asset  LoanType 1   Fixed Performing 90 30 30 30
Asset  LoanType 1   Fixed Defaulted 60 20 20 20
Asset  LoanType 1   Floating Performing 30 10 10 10
Asset  LoanType 1   Floating Defaulted 15 5 5 5
Asset  LoanType 2   fixed Performing 90 30 30 30
Asset  LoanType 2   fixed Defaulted 60 20 20 20
Asset  LoanType 2   floating Performing 15 5 5 5 
Asset  LoanType 2   floating Defaulted 90 30 30 30
Asset  LoanType 3   fixed Performing 60 15 35 10
Asset  LoanType 3   fixed Defaulted 10 3 4 3
Asset  LoanType 3   floating Performing 100 25 50 25
Asset  LoanType 3   floating Defaulted 25 5 15 5

Liability Type 1   Fixed Performing 10 3 4 3
Liability Type 1   Fixed Performing 20 3 10 7
Liability Type 1   Floating Performing 15 5 5 5
Liability Type 1   Floating Performing 100 25 50 25
Liability Type 2   fixed Performing 50 20 20 10
Liability Type 2   fixed Performing 20 10 5 5
Liability Type 2   floating Performing 15 5 5 5
Liability Type 2   floating Performing 15 5 5 5 

;
run;

That list is a made-up , short version of the big data i'm working with.

Kurt_Bremser · Posted 03-07-2018 03:53 AM

I had to slightly correct your data step to accomodate the blanks in Segment:

data have;
infile datalines4 dlm=';';
input Asset $ Segment :$10. fixedorfloating $ Performing :$10. Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
datalines4;
Asset;LoanType 1;Fixed;Performing;90;30;30;30
Asset;LoanType 1;Fixed;Defaulted;60;20;20;20
Asset;LoanType 1;Floating;Performing;30;10;10;10
Asset;LoanType 1;Floating;Defaulted;15;5;5;5
Asset;LoanType 2;Fixed;Performing;90;30;30;30
Asset;LoanType 2;Fixed;Defaulted;60;20;20;20
Asset;LoanType 2;Floating;Performing;15;5;5;5
Asset;LoanType 2;Floating;Defaulted;90;30;30;30
Asset;LoanType 3;Fixed;Performing;60;15;35;10
Asset;LoanType 3;Fixed;Defaulted;10;3;4;3
Asset;LoanType 3;Floating;Performing;100;25;50;25
Asset;LoanType 3;Floating;Defaulted;25;5;15;5
Liability;Type 1;Fixed;Performing;10;3;4;3
Liability;Type 1;Fixed;Performing;20;3;10;7
Liability;Type 1;Floating;Performing;15;5;5;5
Liability;Type 1;Floating;Performing;100;25;50;25
Liability;Type 2;Fixed;Performing;50;20;20;10
Liability;Type 2;Fixed;Performing;20;10;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
;;;;
run;

From this, I take it that your further examples are based on the first observation.

When you posted this:

Initial Volume (2017) = 90
Payment1 (2018) = 30
Payment2 (2019) = 30
Payment3 (2020) = 30

My table looks as follows:
*********************Table 1 ***************************
Payments are made on a (0.33,0.33,0.33) basis
17/17 is the initial amount outstanding.
***********************************************************
Year                 17        18        19        20
Vintage

17                   90        30        30        30

18                    0         0        10        10
  
19                    0         0         0        13.3

20                    0         0         0         0

which formulas are used for filling this matrix?

eg why has row 18/column 19 the value 10?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Report Inappropriate Content · Posted 03-07-2018 04:15 AM

Hi, again, apologies, I thought I put the 1 together.

So for the maturing volumes table the methodology is as follows:

I have enclosed theoretical re-issuance of a debt in brackets, this is essentially pulling down from row [I-1] as you have to replace all in or out flow that is brought in or paid out

Year           17          18         19        20
vintage

17             (90)         30         30        30


18             N/A        (30)        10        10       (10)


19            N/A         N/A        (40)      13.3    (13.3)      (13.3)


20          N/A           N/A       N/A       (53.3)   (53.3/3)   (53.3/3)

So every period a third is paid off the starting value for the year.

Imagine every row to represent something originating that year, everything that has been paid in the past, must be replaced.

Kurt_Bremser · Posted 03-07-2018 04:20 AM

But where do you get the re-issues from (where can I find these values in your example dataset)?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Report Inappropriate Content · Posted 03-07-2018 04:26 AM

They are to be calculated from the previous row, this is why my inefficient coding is limited in SAS.

Kurt_Bremser · Posted 03-07-2018 04:33 AM

@89974114 wrote:

They are to be calculated from the previous row, this is why my inefficient coding is limited in SAS.

But HOW do you calculate them from WHICH previous row? As far as I can see it, this part of the example deals purely with data from the first observation in your example dataset.

Please do not make me pry information bit-by-bit out of your nose.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Report Inappropriate Content · Posted 03-07-2018 04:39 AM

the brackets are the sum from the rows above

if 30 is paid out in row17/column18 then 30 is issued in row18/column18

if 30 is paid out in row17/column 20 and

if 10 is paid out in row18/column 20

then 40 is issued in row 20/ column 20

Kurt_Bremser · Posted 03-07-2018 07:33 AM

Based on the data created by my adapted data step, this will give you the data:

data want;
set have;
array vars {4} Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
vintage = 2017;
output;
gross = 0;
do i = 1 to 3;
  vars{1} = vars{1} + vars{2};
  vintage + 1;
  do i1 = 2 to 4;
    vars{i1} = vars{1} / 3;
  end;
  output;
end;
drop i i1;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Report Inappropriate Content · Posted 03-07-2018 10:39 AM

I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.

Using your code I'm trying to incorporate the payment vector

[1, 1/3 , 1/3 , 1/3 ]

This is found by

Payment vector(i)=Paymentperiod(i) / Paymentperiod(0);

it won't always be a third but this is a good way to sub it in

also, the first vintage is correct so thank you for the code and help

the resulting lines aren't quite correct, the second row gives 18 instead of 10 and so on

i'll continue trying to correct this tomorrow once I've had a break.

ballardw · Posted 03-07-2018 11:56 AM

@89974114 wrote:

I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.

Since your example data actually has no dates I am wondering where "vintage" information comes from.

Things might workout a tad better if instead of 3 periods on a row in the data you have one reo per payment and some sort of date of payment - as an actual date, not 1, 2, 3 but something like 01Jan2018 or even month and year that could be treated as first (or last) of month as needed.

SAS Formats and/or date functions would allow summaries by calendar quarter, calendar year and with some creativity other intervals.

Report Inappropriate Content · Posted 03-07-2018 12:03 PM

I made the data generic to allow for a build of any kind. However, If we take the current day to match the payment period with payments being made or replaced you can assume paymentperiod1 = 2018, paymentperiod2 = 2019, paymentperiod3 = 2020.

Now, the vintages can be thought of as the current year, while the payment period can be thought of as payments required in future and hence, the build of this table.

                   2017        2018        2019        2020

2017          (+90)          -30          -30           -30

2018         N/A            (+30)         -10          -10

2019         N/A            N/A          (+40)       -13.3

2020         NA             NA             Na        (+53.3)

So if the vintage is 2018 (imagine, as it is now 2018) you look forward and you had to replace the payment of -30 last year, then you are expected to pay -10 next year and -10 the following year

again imagine it's 2018, you put NA for 2017 BC its in the past and nonsensical.

hope you can help me on my quest.

Report Inappropriate Content · Posted 03-07-2018 11:28 AM

I'm starting to think a data step could be more practical. Maybe i'm just tired.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away