I am updating some old code which previously took 14 hours to improve efficiency. I'm trying to use a Proc Summary with a Class statement instead of sorting the dataset first, followed by a Proc Summary. When I take a small sample of the dataset this Proc Summary works fine, but with the entire dataset it ran for 10 hours before I halted it. Based on the past code I would only expect about a 2-3 hour run time.
When I broke the proc summary, this the run time:
NOTE: PROCEDURE SUMMARY used:
real time 8:07:47.26
cpu time 19:56.10
This is really a question for Tech Support. They are in the best position to look at your stored process code and give you some ideas of how to improve performance -- there may be a problem in the data, in the configuration of the servers used or the network or anyplace in between.
I think you're going the wrong way with this change. Your subject mentions "memory issue", which suggests you know there is a memory problem. I would turn on step monitoring with [ Options FullSTimer; ] which will give you memory usage as well.
When you summarise using a by statement, the summary data for each stratum is written out at the end of each group, and the data are processed in a linear sequence. If you use a class statement on unsorted data, SAS has to hold a table of stats for each level that isn't written until the end of the table. Memory usage is very much higher, and for large data sets can cause the step to fail when memory is exhausted.
So, I would plan to stay with the BY statement, but look further back in the code to see if the source data could have been generated in sorted order to save the sort step. Sorting data is expensive to the process, but not sorting prior to large table summarisations can be even more expensive.