- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 03-09-2011 10:05 PM
(5173 views)
Hi I have a huge data set 915K records for which I want to do a PROC FREQ on just one field which is 90 bytes long. I seem to be running out space for some reason. Can any offer any help/suggestions on how I might get around this.
NOTE: The SAS System stopped processing this step because of insufficient memory.
I am trying to output this to another dataset with a OUT statement still no joy.
thanks so much in advance.
NOTE: The SAS System stopped processing this step because of insufficient memory.
I am trying to output this to another dataset with a OUT statement still no joy.
thanks so much in advance.
6 REPLIES 6
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
You can try to create an index for this vaiable.
I am not sure it can work.Just a suggestion.
Ksharp
You can try to create an index for this vaiable.
I am not sure it can work.Just a suggestion.
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks. Can you please explain further. Not sure if I know how to do that.
thanks.
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are a number of options that you could try.
1) sort the data set by your variable of interest. then use PROC SUMMARY with just a by statement and no var statement. PROC SUMMARY will output a field called _freq_ in the output dataset.
2) use proc sql with a group by and see what happens.
Darryl
1) sort the data set by your variable of interest. then use PROC SUMMARY with just a by statement and no var statement. PROC SUMMARY will output a field called _freq_ in the output dataset.
2) use proc sql with a group by and see what happens.
Darryl
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Darryl's response should work.
The reason that you were running out of space is that the memory needs of FREQ are a function of the number of DISTINCT values of the variable times it's length (the reference manual should have the exact formula). With that many observations and a text field, you'll could have lots.
The SORTing and SUMMARY are disk space dependent. SQL is a mix, as it tries to put as much in memory as it can and then relies on disk. Both will take longer than FREQ would have done if you had enough memory.
Doc Muhlbaier
Duke
The reason that you were running out of space is that the memory needs of FREQ are a function of the number of DISTINCT values of the variable times it's length (the reference manual should have the exact formula). With that many observations and a text field, you'll could have lots.
The SORTing and SUMMARY are disk space dependent. SQL is a mix, as it tries to put as much in memory as it can and then relies on disk. Both will take longer than FREQ would have done if you had enough memory.
Doc Muhlbaier
Duke
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you both Darryl and Duke. Will try your suggestions.
Much appreciated.
Much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is basically the same idea as suggest already.
I don't know how long it would take to sort the data. You might want to do sort the data in groups then combine and count. Be sure to keep only the variable that needs counting. Should save a lot ot time if the data set has lots of variables.
[pre]
proc sort data=sashelp.shoes(keep=Subsidiary firstobs=1 obs=200) out=bin1;
by Subsidiary;
run;
proc sort data=sashelp.shoes(keep=Subsidiary firstobs=201 obs=max) out=bin2;
by Subsidiary;
run;
data freq;
do Frequency=1 by 1 until(last.Subsidiary);
set bin1 bin2;
by Subsidiary;
end;
CumulativeFrequency + Frequency;
run;
proc print;
run;
[/pre]
I don't know how long it would take to sort the data. You might want to do sort the data in groups then combine and count. Be sure to keep only the variable that needs counting. Should save a lot ot time if the data set has lots of variables.
[pre]
proc sort data=sashelp.shoes(keep=Subsidiary firstobs=1 obs=200) out=bin1;
by Subsidiary;
run;
proc sort data=sashelp.shoes(keep=Subsidiary firstobs=201 obs=max) out=bin2;
by Subsidiary;
run;
data freq;
do Frequency=1 by 1 until(last.Subsidiary);
set bin1 bin2;
by Subsidiary;
end;
CumulativeFrequency + Frequency;
run;
proc print;
run;
[/pre]