I have a table with more than 3 million records. I need to plot a line graph with the number of records in each year, but I only have a start date and an end date. From this I calculate the interval of years between these dates. When I do this, my table grows to over 35 million rows.
I would like to keep the original number of lines, so I placed the ranges within a json in a new column, for example
+----+-------+---------------------------+
| ID | IntervalYears |
+----+-------+---------------------------+
| 1 | {"IntervalYears": [2010, 2011, 2012, 2013]} |
| 2 | {"IntervalYears": [2012, 2013, 2014, 2015]} |
+----+-------+---------------------------+
So, I would like to know if it is possible to plot these quantities per year, but using this column in json, resulting in a graph like this, for example
One way I think might work is to create a new aggregated table containing year and count of number of observations.
You might also add some dimensions if you filter on these in the report.
I am not sure where JSON helps.
You don't say why the number of observations is important so hard to address any detail there.
It might be that processing in a somewhat different order, such as getting a count of the common start/ end pairs before doing the expansion that caused the 35 million rows would be appropriate.
Suppose you COUNT data by start end pairs and get a result similar to this:
start end count 2012 2013 103 2012 2014 145 2012 2015 154 2012 2016 234 2013 2014 98 2013 2015 133 2013 2016 199 2014 2015 55 2014 2016 101 2015 2016 88
When you transform that to get single year keep the count with it so it looks like:
Obs count year 1 103 2012 2 103 2013 3 145 2012 4 145 2013 5 145 2014 6 154 2012 7 154 2013 8 154 2014 9 154 2015 10 234 2012
The example data would have 30 rows of data. Which would still be much smaller than just transorming the 103 rows of 2012 to 2013.
Summarize that data to total the count by the year and there are only 5 rows of data to actually plot with counts in the 622 to 1119 range (for my given example).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.