turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Working with large datasets by group

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-07-2018 02:37 AM - edited 03-07-2018 02:50 AM

I am working with a big dataset and as it stands, due to my recent introduction to sas as a language, I've been working by row as I'm unsure how to work across rows. I have seen a lot of people advising me to work by group, I hope that someone could give me some guidance on how I would do this with an example - That way I can mould and apply the result to all my work in this project:

Supplement: The big data set looks like:

```
Segment 2017 Payment 1 Payment 2 Payment 3
1 90 30 30 30
2 60 20 30 10
3 15 5 5 5
4 etc.
5
6
7
8
```

The segments may be the same but then are sorted by asset class, fixed or floating and then performing or non-performing

The below table is contructed by Segment 1

```
Initial Volume (2017) = 90
Payment1 (2018) = 30
Payment2 (2019) = 30
Payment3 (2020) = 30
My table looks as follows:
*********************Table 1 ***************************
Payments are made on a (0.33,0.33,0.33) basis
17/17 is the initial amount outstanding.
***********************************************************
Year 17 18 19 20
Vintage
17 90 30 30 30
18 0 0 10 10
19 0 0 0 13.3
20 0 0 0 0
**************************Assumption*******************
When the above instrument matures, it must be replaced so new issuance is equal to (Vintage/Year) (essentially summing down the rows into the diagonals)
18/18 19/19 20/20
30 , 40 53.3
**************************Table 2************************
Total volume is equal to the maturing + new Issuance
************************************************************
Year 17 18 19 20
Vintage
17 90 60 30 0
18 0 30 20 10
19 0 0 40 26.67
20 0 0 0 53.33
My issue is that the next row depends on the previous row, how do I work with big data in order to create these tables without binning the segments into individual asset classes and then by row creating the vintages 17--20
in this sense by group functions probably cannot complete this work?
I'm looking into arrays to do this as this would make sense but people keep telling me by group work is the best way computationally.
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 03:06 AM

I have already given you an example in a previous question of yours for how to post example data in a data step. Please post example data in a data step with datalines so we can easily recreate it on our machines for testing.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 03:30 AM - edited 03-07-2018 03:30 AM

Hi Kurt, I apologise, I didn't see your message.

The data i'm working with is simply:

```
data have;
input Asset $ Segment $ fixedorfloating $ Performing $ Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
datalines;
Asset LoanType 1 Fixed Performing 90 30 30 30
Asset LoanType 1 Fixed Defaulted 60 20 20 20
Asset LoanType 1 Floating Performing 30 10 10 10
Asset LoanType 1 Floating Defaulted 15 5 5 5
Asset LoanType 2 fixed Performing 90 30 30 30
Asset LoanType 2 fixed Defaulted 60 20 20 20
Asset LoanType 2 floating Performing 15 5 5 5
Asset LoanType 2 floating Defaulted 90 30 30 30
Asset LoanType 3 fixed Performing 60 15 35 10
Asset LoanType 3 fixed Defaulted 10 3 4 3
Asset LoanType 3 floating Performing 100 25 50 25
Asset LoanType 3 floating Defaulted 25 5 15 5
Liability Type 1 Fixed Performing 10 3 4 3
Liability Type 1 Fixed Performing 20 3 10 7
Liability Type 1 Floating Performing 15 5 5 5
Liability Type 1 Floating Performing 100 25 50 25
Liability Type 2 fixed Performing 50 20 20 10
Liability Type 2 fixed Performing 20 10 5 5
Liability Type 2 floating Performing 15 5 5 5
Liability Type 2 floating Performing 15 5 5 5
;
run;
```

That list is a made-up , short version of the big data i'm working with.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 03:53 AM

I had to slightly correct your data step to accomodate the blanks in Segment:

```
data have;
infile datalines4 dlm=';';
input Asset $ Segment :$10. fixedorfloating $ Performing :$10. Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
datalines4;
Asset;LoanType 1;Fixed;Performing;90;30;30;30
Asset;LoanType 1;Fixed;Defaulted;60;20;20;20
Asset;LoanType 1;Floating;Performing;30;10;10;10
Asset;LoanType 1;Floating;Defaulted;15;5;5;5
Asset;LoanType 2;Fixed;Performing;90;30;30;30
Asset;LoanType 2;Fixed;Defaulted;60;20;20;20
Asset;LoanType 2;Floating;Performing;15;5;5;5
Asset;LoanType 2;Floating;Defaulted;90;30;30;30
Asset;LoanType 3;Fixed;Performing;60;15;35;10
Asset;LoanType 3;Fixed;Defaulted;10;3;4;3
Asset;LoanType 3;Floating;Performing;100;25;50;25
Asset;LoanType 3;Floating;Defaulted;25;5;15;5
Liability;Type 1;Fixed;Performing;10;3;4;3
Liability;Type 1;Fixed;Performing;20;3;10;7
Liability;Type 1;Floating;Performing;15;5;5;5
Liability;Type 1;Floating;Performing;100;25;50;25
Liability;Type 2;Fixed;Performing;50;20;20;10
Liability;Type 2;Fixed;Performing;20;10;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
Liability;Type 2;Floating;Performing;15;5;5;5
;;;;
run;
```

From this, I take it that your further examples are based on the first observation.

When you posted this:

Initial Volume (2017) = 90 Payment1 (2018) = 30 Payment2 (2019) = 30 Payment3 (2020) = 30 My table looks as follows: *********************Table 1 *************************** Payments are made on a (0.33,0.33,0.33) basis 17/17 is the initial amount outstanding. *********************************************************** Year 17 18 19 20 Vintage 17 90 30 30 30 18 0 0 10 10 19 0 0 0 13.3 20 0 0 0 0

which formulas are used for filling this matrix?

eg why has row 18/column 19 the value 10?

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 04:15 AM

Hi, again, apologies, I thought I put the 1 together.

So for the maturing volumes table the methodology is as follows:

I have enclosed theoretical re-issuance of a debt in brackets, this is essentially pulling down from row [I-1] as you have to replace all in or out flow that is brought in or paid out

```
Year 17 18 19 20
vintage
17 (90) 30 30 30
18 N/A (30) 10 10 (10)
19 N/A N/A (40) 13.3 (13.3) (13.3)
20 N/A N/A N/A (53.3) (53.3/3) (53.3/3)
```

So every period a third is paid off the starting value for the year.

Imagine every row to represent something originating that year, everything that has been paid in the past, must be replaced.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 04:20 AM

But where do you get the re-issues from (where can I find these values in your example dataset)?

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 04:26 AM

They are to be calculated from the previous row, this is why my inefficient coding is limited in SAS.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 04:33 AM

89974114 wrote:

But HOW do you calculate them from WHICH previous row? As far as I can see it, this part of the example deals purely with data from the first observation in your example dataset.

Please do not make me pry information bit-by-bit out of your nose.

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 04:39 AM

the brackets are the sum from the rows above

if 30 is paid out in row17/column18 then 30 is issued in row18/column18

if 30 is paid out in row17/column 20 and

if 10 is paid out in row18/column 20

then 40 is issued in row 20/ column 20

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 07:33 AM

Based on the data created by my adapted data step, this will give you the data:

```
data want;
set have;
array vars {4} Gross PaymentPeriod1 PaymentPeriod2 PaymentPeriod3;
vintage = 2017;
output;
gross = 0;
do i = 1 to 3;
vars{1} = vars{1} + vars{2};
vintage + 1;
do i1 = 2 to 4;
vars{i1} = vars{1} / 3;
end;
output;
end;
drop i i1;
run;
```

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 10:39 AM

I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.

Using your code I'm trying to incorporate the payment vector

```
[1, 1/3 , 1/3 , 1/3 ]
This is found by
Payment vector(i)=Paymentperiod(i) / Paymentperiod(0);
it won't always be a third but this is a good way to sub it in
also, the first vintage is correct so thank you for the code and help
the resulting lines aren't quite correct, the second row gives 18 instead of 10 and so on
i'll continue trying to correct this tomorrow once I've had a break.
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to 89974114

03-07-2018 11:56 AM

89974114 wrote:

I like the build for the additional vintages. Unfortunately, the model needs some more constraints to create the correct table. I think some minor adjustments should achieve this, I've been trying to figure it out but i'm stuck and it has been a long day.

Since your example data actually has no dates I am wondering where "vintage" information comes from.

Things might workout a tad better if instead of 3 periods on a row in the data you have one reo per payment and some sort of date of payment - as an actual date, not 1, 2, 3 but something like 01Jan2018 or even month and year that could be treated as first (or last) of month as needed.

SAS Formats and/or date functions would allow summaries by calendar quarter, calendar year and with some creativity other intervals.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

03-07-2018 12:03 PM

I made the data generic to allow for a build of any kind. However, If we take the current day to match the payment period with payments being made or replaced you can assume paymentperiod1 = 2018, paymentperiod2 = 2019, paymentperiod3 = 2020.

Now, the vintages can be thought of as the current year, while the payment period can be thought of as payments required in future and hence, the build of this table.

```
2017 2018 2019 2020
2017 (+90) -30 -30 -30
2018 N/A (+30) -10 -10
2019 N/A N/A (+40) -13.3
2020 NA NA Na (+53.3)
```

So if the vintage is 2018 (imagine, as it is now 2018) you look forward and you had to replace the payment of -30 last year, then you are expected to pay -10 next year and -10 the following year

again imagine it's 2018, you put NA for 2017 BC its in the past and nonsensical.

hope you can help me on my quest.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-07-2018 11:28 AM

I'm starting to think a data step could be more practical. Maybe i'm just tired.