BookmarkSubscribeRSS Feed
Bruna_SAS
Calcite | Level 5

I have a table with more than 3 million records. I need to plot a line graph with the number of records in each year, but I only have a start date and an end date. From this I calculate the interval of years between these dates. When I do this, my table grows to over 35 million rows.

I would like to keep the original number of lines, so I placed the ranges within a json in a new column, for example

+----+-------+---------------------------+
| ID  | IntervalYears |
+----+-------+---------------------------+
| 1 | {"IntervalYears": [2010, 2011, 2012, 2013]} |
| 2 | {"IntervalYears": [2012, 2013, 2014, 2015]} |
+----+-------+---------------------------+

 

So, I would like to know if it is possible to plot these quantities per year, but using this column in json, resulting in a graph like this, for example

Bruna_SAS_0-1719841870749.png

 

2 REPLIES 2
FredrikE
Rhodochrosite | Level 12

One way I think might work is to create a new aggregated table containing year and count of number of observations.

You might also add some dimensions if you filter on these in the report.

 

ballardw
Super User

I am not sure where JSON helps.

You don't say why the number of observations is important so hard to address any detail there.

 

It might be that processing in a somewhat different order, such as getting a count of the common start/ end pairs before doing the expansion that caused the 35 million rows would be appropriate.

 

Suppose you COUNT data by start end pairs and get a result similar to this:

start end count
2012 2013 103
2012 2014 145
2012 2015 154
2012 2016 234
2013 2014 98
2013 2015 133
2013 2016 199
2014 2015 55
2014 2016 101
2015 2016 88

When you transform that to get single year keep the count with it so it looks like:

Obs count year 
1 103 2012 
2 103 2013 
3 145 2012 
4 145 2013 
5 145 2014 
6 154 2012 
7 154 2013 
8 154 2014 
9 154 2015 
10 234 2012 

The  example data would have 30 rows of data. Which would still be much smaller than just transorming the 103 rows of 2012 to 2013.

Summarize that data to total the count by the year and there are only 5 rows of data to actually plot with counts in the 622 to 1119 range (for my given example).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 691 views
  • 0 likes
  • 3 in conversation