I am working with a big dataset and as it stands, due to my recent introduction to sas as a language, I've been working by row as I'm unsure how to work across rows. I have seen a lot of people advising me to work by group, I hope that someone could give me some guidance on how I would do this with an example - That way I can mould and apply the result to all my work in this project:
Supplement: The big data set looks like:
Segment 2017 Payment 1 Payment 2 Payment 3
1 90 30 30 30
2 60 20 30 10
3 15 5 5 5
4 etc.
5
6
7
8
The segments may be the same but then are sorted by asset class, fixed or floating and then performing or non-performing
The below table is contructed by Segment 1
Initial Volume (2017) = 90
Payment1 (2018) = 30
Payment2 (2019) = 30
Payment3 (2020) = 30
My table looks as follows:
*********************Table 1 ***************************
Payments are made on a (0.33,0.33,0.33) basis
17/17 is the initial amount outstanding.
***********************************************************
Year 17 18 19 20
Vintage
17 90 30 30 30
18 0 0 10 10
19 0 0 0 13.3
20 0 0 0 0
**************************Assumption*******************
When the above instrument matures, it must be replaced so new issuance is equal to (Vintage/Year) (essentially summing down the rows into the diagonals)
18/18 19/19 20/20
30 , 40 53.3
**************************Table 2************************
Total volume is equal to the maturing + new Issuance
************************************************************
Year 17 18 19 20
Vintage
17 90 60 30 0
18 0 30 20 10
19 0 0 40 26.67
20 0 0 0 53.33
My issue is that the next row depends on the previous row, how do I work with big data in order to create these tables without binning the segments into individual asset classes and then by row creating the vintages 17--20
in this sense by group functions probably cannot complete this work?
I'm looking into arrays to do this as this would make sense but people keep telling me by group work is the best way computationally.
I have already given you an example in a previous question of yours for how to post example data in a data step. Please post example data in a data step with datalines so we can easily recreate it on our machines for testing.
Hi Kurt, I apologise, I didn't see your message.
The data i'm working with is simply:
data have;
input Asset $ Segment $ fixedorfloating $ Performing $ Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
datalines;
Asset LoanType 1 Fixed Performing 90 30 30 30
Asset LoanType 1 Fixed Defaulted 60 20 20 20
Asset LoanType 1 Floating Performing 30 10 10 10
Asset LoanType 1 Floating Defaulted 15 5 5 5
Asset LoanType 2 fixed Performing 90 30 30 30
Asset LoanType 2 fixed Defaulted 60 20 20 20
Asset LoanType 2 floating Performing 15 5 5 5
Asset LoanType 2 floating Defaulted 90 30 30 30
Asset LoanType 3 fixed Performing 60 15 35 10
Asset LoanType 3 fixed Defaulted 10 3 4 3
Asset LoanType 3 floating Performing 100 25 50 25
Asset LoanType 3 floating Defaulted 25 5 15 5
Liability Type 1 Fixed Performing 10 3 4 3
Liability Type 1 Fixed Performing 20 3 10 7
Liability Type 1 Floating Performing 15 5 5 5
Liability Type 1 Floating Performing 100 25 50 25
Liability Type 2 fixed Performing 50 20 20 10
Liability Type 2 fixed Performing 20 10 5 5
Liability Type 2 floating Performing 15 5 5 5
Liability Type 2 floating Performing 15 5 5 5
;
run;
That list is a made-up , short version of the big data i'm working with.
I had to slightly correct your data step to accomodate the blanks in Segment:
data have;
infile datalines4 dlm=';';
input Asset $ Segment :$10. fixedorfloating $ Performing :$10. Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
datalines4;
Asset;LoanType 1;Fixed;Performing;90;30;30;30
Asset;LoanType 1;Fixed;Defaulted;60;20;20;20
Asset;LoanType 1;Floating;Performing;30;10;10;10
Asset;LoanType 1;Floating;Defaulted;15;5;5;5
Asset;LoanType 2;Fixed;Performing;90;30;30;30
Asset;LoanType 2;Fixed;Defaulted;60;20;20;20
Asset;LoanType 2;Floating;Performing;15;5;5;5
Asset;LoanType 2;Floating;Defaulted;90;30;30;30
Asset;LoanType 3;Fixed;Performing;60;15;35;10
Asset;LoanType 3;Fixed;Defaulted;10;3;4;3
Asset;LoanType 3;Floating;Performing;100;25;50;25
Asset;LoanType 3;Floating;Defaulted;25;5;15;5
Liability;Type 1;Fixed;Performing;10;3;4;3
Liability;Type 1;Fixed;Performing;20;3;10;7
Liability;Type 1;Floating;Performing;15;5;5;5
Liability;Type 1;Floating;Performing;100;25;50;25
Liability;Type 2;Fixed;Performing;50;20;20;10
Liability;Type 2;Fixed;Performing;20;10;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
;;;;
run;
From this, I take it that your further examples are based on the first observation.
When you posted this:
Initial Volume (2017) = 90 Payment1 (2018) = 30 Payment2 (2019) = 30 Payment3 (2020) = 30 My table looks as follows: *********************Table 1 *************************** Payments are made on a (0.33,0.33,0.33) basis 17/17 is the initial amount outstanding. *********************************************************** Year 17 18 19 20 Vintage 17 90 30 30 30 18 0 0 10 10 19 0 0 0 13.3 20 0 0 0 0
which formulas are used for filling this matrix?
eg why has row 18/column 19 the value 10?
Hi, again, apologies, I thought I put the 1 together.
So for the maturing volumes table the methodology is as follows:
I have enclosed theoretical re-issuance of a debt in brackets, this is essentially pulling down from row [I-1] as you have to replace all in or out flow that is brought in or paid out
Year 17 18 19 20
vintage
17 (90) 30 30 30
18 N/A (30) 10 10 (10)
19 N/A N/A (40) 13.3 (13.3) (13.3)
20 N/A N/A N/A (53.3) (53.3/3) (53.3/3)
So every period a third is paid off the starting value for the year.
Imagine every row to represent something originating that year, everything that has been paid in the past, must be replaced.
But where do you get the re-issues from (where can I find these values in your example dataset)?
They are to be calculated from the previous row, this is why my inefficient coding is limited in SAS.
@89974114 wrote:
They are to be calculated from the previous row, this is why my inefficient coding is limited in SAS.
But HOW do you calculate them from WHICH previous row? As far as I can see it, this part of the example deals purely with data from the first observation in your example dataset.
Please do not make me pry information bit-by-bit out of your nose.
the brackets are the sum from the rows above
if 30 is paid out in row17/column18 then 30 is issued in row18/column18
if 30 is paid out in row17/column 20 and
if 10 is paid out in row18/column 20
then 40 is issued in row 20/ column 20
Based on the data created by my adapted data step, this will give you the data:
data want;
set have;
array vars {4} Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
vintage = 2017;
output;
gross = 0;
do i = 1 to 3;
vars{1} = vars{1} + vars{2};
vintage + 1;
do i1 = 2 to 4;
vars{i1} = vars{1} / 3;
end;
output;
end;
drop i i1;
run;
I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.
Using your code I'm trying to incorporate the payment vector
[1, 1/3 , 1/3 , 1/3 ]
This is found by
Payment vector(i)=Paymentperiod(i) / Paymentperiod(0);
it won't always be a third but this is a good way to sub it in
also, the first vintage is correct so thank you for the code and help
the resulting lines aren't quite correct, the second row gives 18 instead of 10 and so on
i'll continue trying to correct this tomorrow once I've had a break.
@89974114 wrote:
I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.
Since your example data actually has no dates I am wondering where "vintage" information comes from.
Things might workout a tad better if instead of 3 periods on a row in the data you have one reo per payment and some sort of date of payment - as an actual date, not 1, 2, 3 but something like 01Jan2018 or even month and year that could be treated as first (or last) of month as needed.
SAS Formats and/or date functions would allow summaries by calendar quarter, calendar year and with some creativity other intervals.
I made the data generic to allow for a build of any kind. However, If we take the current day to match the payment period with payments being made or replaced you can assume paymentperiod1 = 2018, paymentperiod2 = 2019, paymentperiod3 = 2020.
Now, the vintages can be thought of as the current year, while the payment period can be thought of as payments required in future and hence, the build of this table.
2017 2018 2019 2020
2017 (+90) -30 -30 -30
2018 N/A (+30) -10 -10
2019 N/A N/A (+40) -13.3
2020 NA NA Na (+53.3)
So if the vintage is 2018 (imagine, as it is now 2018) you look forward and you had to replace the payment of -30 last year, then you are expected to pay -10 next year and -10 the following year
again imagine it's 2018, you put NA for 2017 BC its in the past and nonsensical.
hope you can help me on my quest.
I'm starting to think a data step could be more practical. Maybe i'm just tired.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.