BookmarkSubscribeRSS Feed
PravinMishra
Quartz | Level 8

Hi,

I am facing issue with proc freq .request you to please help me out.

1.while running Proc freq on 57millions records.Its throwing an error of memory issue. I am not creating output in HTML / ODS. output is created in sasdata set and only generating  nlevels (distinct value)Note that I can't use proc sql as there is some issue with distinct cout. How to mitigate memory issue.

2.Can I stop count and cummulative percent calculation.Only I need distinct value?

Thanks

19 REPLIES 19
ballardw
Super User

Post the code you're using so far.

PravinMishra
Quartz | Level 8

Hi,

Please find the code which I am using to pull the nlevels of records. variable are both in character and numeric.

proc freq data=table_name (keep=col_name) noprint nlevels;

table col_name/out=freqout(keep=col_name) norow nocol ;

run;

I am using norow nocol to avoid count and cum percent getting calculated but it's not helping me out.

ballardw
Super User

You have a couple of syntax issues:

the output dataset for a single variable would have count and percent it would not have levels. NOROW and NOCOL suppress percentages of a cross tab, so they are doing nothing with your data, use NOCOUNT and NOPERCENT.

Levels information is displayed in listing or ods output. If you want the result in a dataset then you use ODS to get that.

Drop the nlevels and see if that works.

Then run a second proc freq on your output data set:

proc freq data=freqout;

tables count;

run;

the total CUMULATIVE frequency will be how many original unique values were in the data set.

I suspect you may have had similar syntax errors in your SQL approach.

Proc sql;

     select count(*)

     from (select distinct col_name from table_name);

quit;

Reeza
Super User

The output wouldn't have count or percent, only the distinct values and the Nlevels won't do anything besides take up processing time, removing it does speed it up significantly as well.

Depends on what the OP wants in the final results.

PravinMishra
Quartz | Level 8

Hi ,

I have used the same syntax for proc sql but due to precision the output are different in proc freq vs proc sql distinct count.I can't use proc sql.

I understand that NOROW and NOCOL is not going to help me.I was just trying to stop calcualtion happening in cumalative and count.

I don't think NOCOUNT options are there in proc freq.

Thanks for your suggestion using ODS.

Reeza
Super User

57 million records is a lot and may take a while but not run out of memory. Have you tried the memsize option? Are you running on a server or a desktop?

Can you sort the data set without running out of memory?

Kurt_Bremser
Super User

proc sort data=table_name (keep=col_name) out=temp1 nodupkey;

by col_name;

run;

/* work.temp1 has one line per distinct value */

/* get the dataset size from the table metadata, quicker than a count(*) */

data freqout (keep=col_name);

set sashelp.vtable (rename=(nobs=col_name) where=(libname='WORK' and memname='TEMP1'))

run;

PravinMishra
Quartz | Level 8

Hi Kurt,

Thanks for your response and suggestions.

I have already tried proc sort with nodupkey and proc sql.but count are not matching with one of my report.

Due to which I can't use it.

Thanks

Reeza
Super User

50827 - When you use the COUNT(Distinct X) function in the SQL procedure, incorrect counts might be ...

How was your report calculated?

You never mentioned if you were running on a server or desktop?

Kurt_Bremser
Super User

Ha! Just (right this moment) had one of these over here.

Another "why I don't use SQL in SAS, unless absolutely necessary".

The proc SQL in question took 46 minutes real time, while the data step solution took 6 seconds(!!!). And the SQL dropped 1 observation one of the counts on top of that.

PravinMishra
Quartz | Level 8

Hi Reeza,

Apologize for not mentioning it.I am running it on server.

Thankss

Ksharp
Super User

Give some sample data and the output to explain your question more .

and one more important thing is if your dataset have been sorted ?

Xia Keshan

jakarman
Barite | Level 11

I tested with a 21M observations faked dataset on UE.

There is no problem with that. If you have a lot of different values all what is counted needs to be in memory.

Sorting/ordering  on the variable and than doing count with by processing will limit memory usage.

27M is still small unless you have big/long variables.     

What are you system limits resources versions and environment?

44         data carsm (drop=i);

45          set sashelp.cars;

46           do i=1 to 50000;  output;

47           end;

48          run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.

NOTE: The data set WORK.CARSM has 21400000 observations and 15 variables.

NOTE: DATA statement used (Total process time):

       real time           24.44 seconds

       cpu time            9.43 seconds

     

49         options fullstimer;

50         proc freq data=carsm ;

51          table make / noprint norow nocol out=cart ;

52         run;

NOTE: There were 21400000 observations read from the data set WORK.CARSM.

NOTE: The data set WORK.CART has 38 observations and 3 variables.

NOTE: PROCEDURE FREQ used (Total process time):

       real time           3.02 seconds

       user cpu time       1.99 seconds

       system cpu time     0.48 seconds

       memory              677.43k

       OS Memory           23188.00k

       Timestamp           08-10-2014 05:02:12 AM

       Step Count                        6  Switch Count  34

       Page Faults                       3

       Page Reclaims                     242

       Page Swaps                        0

       Voluntary Context Switches        94

       Involuntary Context Switches      61

       Block Input Operations            672

       Block Output Operations           280

---->-- ja karman --<-----

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 3200 views
  • 0 likes
  • 6 in conversation