Desktop productivity for business analysts and programmers

A quicker way to summarise datasets?

Posts: 0

A quicker way to summarise datasets?

I currently use the Summary Tables node to summarise any data I have, which works fine with smaller datasets.

However, this takes several hours with larger datasets.

Would it be quicker to use proc sql or proc summary/means in a code node?

Community Manager
Posts: 2,697

Re: A quicker way to summarise datasets?


You can use the Query Builder node to create summary tables with aggregates/grouping. That uses PROC SQL.

Then you can use another task (List Data, List Report) to create a report, if you want.

Esteemed Advisor
Posts: 5,198

Re: A quicker way to summarise datasets?

By experience, the quickest way to summarize data with SAS is by using PROC SUMMARY/MEANS with CLASS.
Can't see that any EG tasks that is using them though.
My first guess would be that TABULATE (summary tables task) would use the same algorithm, but I'm not sure.
Apart from different grouping algorithms between SQL group by and SUMMARY CLASS, is the ability to specify ID columns,
in case where you have a set of columns that forms a hierarchy, no need to build summary groups for all, just for the level with finest granularity.

If your data is already sorted, PROC SUMMARY with BY is the most efficient way to summarize.

Data never sleeps
Posts: 8,721

Re: A quicker way to summarise datasets?

I believe the Summary Tables task uses PROC TABULATE and the Summary -Statistics- Task or Wizard uses PROC MEANS (which is the same as PROC SUMMARY).

To benchmark, you could compare the query method with the Summary Statistics method to see which produces the better results.

Valued Guide
Posts: 2,174

Re: A quicker way to summarise datasets?

if tabulate is doing the summary, make sure the listing destination is not open!
Ask a Question
Discussion stats
  • 4 replies
  • 5 in conversation