Help using Base SAS procedures

PROC FREQ for a large data set

Occasional Contributor
Posts: 15

PROC FREQ for a large data set

Hi I have a huge data set 915K records for which I want to do a PROC FREQ on just one field which is 90 bytes long. I seem to be running out space for some reason. Can any offer any help/suggestions on how I might get around this.

NOTE: The SAS System stopped processing this step because of insufficient memory.
I am trying to output this to another dataset with a OUT statement still no joy.

thanks so much in advance.
Super User
Posts: 9,662

Re: PROC FREQ for a large data set

You can try to create an index for this vaiable.
I am not sure it can work.Just a suggestion.

Occasional Contributor
Posts: 15

Re: PROC FREQ for a large data set

thanks. Can you please explain further. Not sure if I know how to do that.

Frequent Contributor
Posts: 139

Re: PROC FREQ for a large data set

There are a number of options that you could try.

1) sort the data set by your variable of interest. then use PROC SUMMARY with just a by statement and no var statement. PROC SUMMARY will output a field called _freq_ in the output dataset.

2) use proc sql with a group by and see what happens.

Trusted Advisor
Posts: 2,113

Re: PROC FREQ for a large data set

Darryl's response should work.

The reason that you were running out of space is that the memory needs of FREQ are a function of the number of DISTINCT values of the variable times it's length (the reference manual should have the exact formula). With that many observations and a text field, you'll could have lots.

The SORTing and SUMMARY are disk space dependent. SQL is a mix, as it tries to put as much in memory as it can and then relies on disk. Both will take longer than FREQ would have done if you had enough memory.

Doc Muhlbaier
Occasional Contributor
Posts: 15

Re: PROC FREQ for a large data set

thank you both Darryl and Duke. Will try your suggestions.

Much appreciated.
Respected Advisor
Posts: 3,777

Re: PROC FREQ for a large data set

This is basically the same idea as suggest already.

I don't know how long it would take to sort the data. You might want to do sort the data in groups then combine and count. Be sure to keep only the variable that needs counting. Should save a lot ot time if the data set has lots of variables.

proc sort firstobs=1 obs=200) out=bin1;
by Subsidiary;
proc sort firstobs=201 obs=max) out=bin2;
by Subsidiary;

data freq;
do Frequency=1 by 1 until(last.Subsidiary);
set bin1 bin2;
by Subsidiary;
CumulativeFrequency + Frequency;
proc print;
Ask a Question
Discussion stats
  • 6 replies
  • 5 in conversation