Hello all,
I've been working on a project for a couple weeks now and am unsure of where to go from here. This will be my first post, although I have been browsing this great community forum for so long, always learning.
I have a HUGE dataset currently that looks something like this:
Year | Month | Pmt |
2017 | 1 | 34.77685505 |
2017 | 1 | 47.85238688 |
2017 | 1 | 11.15835 |
2017 | 2 | 44.31531843 |
2017 | 2 | 1.044526792 |
2017 | 3 | 66.05520417 |
2017 | 4 | 62.55252575 |
2017 | 5 | 86.27913541 |
2017 | 6 | 89.09913235 |
2017 | 9 | 37.12795924 |
2017 | 14 | 29.66068787 |
2018 | 1 | 93.88662208 |
2018 | 1 | 84.94683944 |
2018 | 3 | 94.91144317 |
This is only a very very small subset of the data.
My goal is to: Create a triangle matrix with incremental data. It should basically look like this:
Month | |||||||||||||
3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | 27 | 30 | 33 | 36 | 39 | |
Year | |||||||||||||
2017 | 205.2026413 | 237.9307935 | 37.12795924 | 0 | 29.66068787 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2018 | 273.7449047 | 0 |
AGAIN, this is a much smaller portion of the dataset. The years range from 1940-2018, and the quarters range from 3-600.
What my method is right now:
1. I create more than 100 tables with incremental data. For example I will create table q0003 which is a table that looks like this:
2017 205.2026413
2018 273.7449047
It just tells me incremental data from the 00th month to the 03th month for each year.
And then q0306 would be
2017 237.9307935
2018 0
2. I join all these >100 tables giving me an incremental triangle.
This process seems extremely long and very unintuitive. Also not really able to be replicated when dataset is updated. Is there any better method you could provide me with?
Any questions, just ask.
Thanks everyone.
What is month 15, 18, 21, etc.??
How is the 205.2026413 for 2017 computed?
Month 15,18,21, etc. are the number of months after the year has started.
So Month 15 in the 2017 row would be any payments where the year = 2017 and the 12<Month<= 15.
The 205.2026413 for 2017 is computed by taking the summation of pmt where year = 2017 and 0< Month <= 3. Which would be the first 6 rows of Pmt.
So month 15 of 2017 is March 2018?
Yes, BUT it's only for payments that came in during Month 15, or March 2018, for invoices created in 2017.
So that's why the pmt <= 3 and Year=2018 are not included in the final table under month 15 of 2017.
Does this make more sense?
How would one know what year the invoice is, if it is not in your original data as shown?
The "Year" column in my original dataset (the first dataset) is the Year of the Invoice. The "Month" column is the month that the invoice is paid.
For example, if we have
Year Month Pmt
2014 34 100
That is a $100 payment in October 2016 for an invoice created 2014.
Thanks for the explanation.
This is UNTESTED CODE
data have2;
set have;
month=floor((month-1)/3);
run;
proc report data=have2;
columns year month,pmt;
define year/group;
define month/across;
define pmt/analysis sum;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.