Proc Freq Out of memory issue

Reply
Contributor
Posts: 60

Proc Freq Out of memory issue

Hi,

I am facing issue with proc freq .request you to please help me out.

1.while running Proc freq on 57millions records.Its throwing an error of memory issue. I am not creating output in HTML / ODS. output is created in sasdata set and only generating  nlevels (distinct value)Note that I can't use proc sql as there is some issue with distinct cout. How to mitigate memory issue.

2.Can I stop count and cummulative percent calculation.Only I need distinct value?

Thanks

Super User
Posts: 11,343

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

Post the code you're using so far.

Contributor
Posts: 60

Re: Proc Freq Out of memory issue

Hi,

Please find the code which I am using to pull the nlevels of records. variable are both in character and numeric.

proc freq data=table_name (keep=col_name) noprint nlevels;

table col_name/out=freqout(keep=col_name) norow nocol ;

run;

I am using norow nocol to avoid count and cum percent getting calculated but it's not helping me out.

Super User
Posts: 11,343

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

You have a couple of syntax issues:

the output dataset for a single variable would have count and percent it would not have levels. NOROW and NOCOL suppress percentages of a cross tab, so they are doing nothing with your data, use NOCOUNT and NOPERCENT.

Levels information is displayed in listing or ods output. If you want the result in a dataset then you use ODS to get that.

Drop the nlevels and see if that works.

Then run a second proc freq on your output data set:

proc freq data=freqout;

tables count;

run;

the total CUMULATIVE frequency will be how many original unique values were in the data set.

I suspect you may have had similar syntax errors in your SQL approach.

Proc sql;

     select count(*)

     from (select distinct col_name from table_name);

quit;

Super User
Posts: 19,822

Re: Proc Freq Out of memory issue

The output wouldn't have count or percent, only the distinct values and the Nlevels won't do anything besides take up processing time, removing it does speed it up significantly as well.

Depends on what the OP wants in the final results.

Contributor
Posts: 60

Re: Proc Freq Out of memory issue

Hi ,

I have used the same syntax for proc sql but due to precision the output are different in proc freq vs proc sql distinct count.I can't use proc sql.

I understand that NOROW and NOCOL is not going to help me.I was just trying to stop calcualtion happening in cumalative and count.

I don't think NOCOUNT options are there in proc freq.

Thanks for your suggestion using ODS.

Super User
Posts: 19,822

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

57 million records is a lot and may take a while but not run out of memory. Have you tried the memsize option? Are you running on a server or a desktop?

Can you sort the data set without running out of memory?

Super User
Posts: 7,831

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

proc sort data=table_name (keep=col_name) out=temp1 nodupkey;

by col_name;

run;

/* work.temp1 has one line per distinct value */

/* get the dataset size from the table metadata, quicker than a count(*) */

data freqout (keep=col_name);

set sashelp.vtable (rename=(nobs=col_name) where=(libname='WORK' and memname='TEMP1'))

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 60

Re: Proc Freq Out of memory issue

Posted in reply to KurtBremser

Hi Kurt,

Thanks for your response and suggestions.

I have already tried proc sort with nodupkey and proc sql.but count are not matching with one of my report.

Due to which I can't use it.

Thanks

Super User
Posts: 7,831

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

Then your report is obviously broken.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 19,822

Re: Proc Freq Out of memory issue

Posted in reply to KurtBremser

50827 - When you use the COUNT(Distinct X) function in the SQL procedure, incorrect counts might be ...

How was your report calculated?

You never mentioned if you were running on a server or desktop?

Super User
Posts: 7,831

Re: Proc Freq Out of memory issue

Ha! Just (right this moment) had one of these over here.

Another "why I don't use SQL in SAS, unless absolutely necessary".

The proc SQL in question took 46 minutes real time, while the data step solution took 6 seconds(!!!). And the SQL dropped 1 observation one of the counts on top of that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 60

Re: Proc Freq Out of memory issue

Hi Reeza,

Apologize for not mentioning it.I am running it on server.

Thankss

Super User
Posts: 10,041

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

Give some sample data and the output to explain your question more .

and one more important thing is if your dataset have been sorted ?

Xia Keshan

Trusted Advisor
Posts: 3,214

Re: Proc Freq Out of memory issue

Posted in reply to PravinMishra

I tested with a 21M observations faked dataset on UE.

There is no problem with that. If you have a lot of different values all what is counted needs to be in memory.

Sorting/ordering  on the variable and than doing count with by processing will limit memory usage.

27M is still small unless you have big/long variables.     

What are you system limits resources versions and environment?

44         data carsm (drop=i);

45          set sashelp.cars;

46           do i=1 to 50000;  output;

47           end;

48          run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.

NOTE: The data set WORK.CARSM has 21400000 observations and 15 variables.

NOTE: DATA statement used (Total process time):

       real time           24.44 seconds

       cpu time            9.43 seconds

     

49         options fullstimer;

50         proc freq data=carsm ;

51          table make / noprint norow nocol out=cart ;

52         run;

NOTE: There were 21400000 observations read from the data set WORK.CARSM.

NOTE: The data set WORK.CART has 38 observations and 3 variables.

NOTE: PROCEDURE FREQ used (Total process time):

       real time           3.02 seconds

       user cpu time       1.99 seconds

       system cpu time     0.48 seconds

       memory              677.43k

       OS Memory           23188.00k

       Timestamp           08-10-2014 05:02:12 AM

       Step Count                        6  Switch Count  34

       Page Faults                       3

       Page Reclaims                     242

       Page Swaps                        0

       Voluntary Context Switches        94

       Involuntary Context Switches      61

       Block Input Operations            672

       Block Output Operations           280

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 19 replies
  • 1397 views
  • 0 likes
  • 6 in conversation