10-06-2014 06:48 PM
I am facing issue with proc freq .request you to please help me out.
1.while running Proc freq on 57millions records.Its throwing an error of memory issue. I am not creating output in HTML / ODS. output is created in sasdata set and only generating nlevels (distinct value)Note that I can't use proc sql as there is some issue with distinct cout. How to mitigate memory issue.
2.Can I stop count and cummulative percent calculation.Only I need distinct value?
10-06-2014 07:02 PM
Please find the code which I am using to pull the nlevels of records. variable are both in character and numeric.
proc freq data=table_name (keep=col_name) noprint nlevels;
table col_name/out=freqout(keep=col_name) norow nocol ;
I am using norow nocol to avoid count and cum percent getting calculated but it's not helping me out.
10-06-2014 07:40 PM
You have a couple of syntax issues:
the output dataset for a single variable would have count and percent it would not have levels. NOROW and NOCOL suppress percentages of a cross tab, so they are doing nothing with your data, use NOCOUNT and NOPERCENT.
Levels information is displayed in listing or ods output. If you want the result in a dataset then you use ODS to get that.
Drop the nlevels and see if that works.
Then run a second proc freq on your output data set:
proc freq data=freqout;
the total CUMULATIVE frequency will be how many original unique values were in the data set.
I suspect you may have had similar syntax errors in your SQL approach.
from (select distinct col_name from table_name);
10-06-2014 08:46 PM
The output wouldn't have count or percent, only the distinct values and the Nlevels won't do anything besides take up processing time, removing it does speed it up significantly as well.
Depends on what the OP wants in the final results.
10-07-2014 11:17 AM
I have used the same syntax for proc sql but due to precision the output are different in proc freq vs proc sql distinct count.I can't use proc sql.
I understand that NOROW and NOCOL is not going to help me.I was just trying to stop calcualtion happening in cumalative and count.
I don't think NOCOUNT options are there in proc freq.
Thanks for your suggestion using ODS.
10-06-2014 07:12 PM
57 million records is a lot and may take a while but not run out of memory. Have you tried the memsize option? Are you running on a server or a desktop?
Can you sort the data set without running out of memory?
10-07-2014 04:06 AM
proc sort data=table_name (keep=col_name) out=temp1 nodupkey;
/* work.temp1 has one line per distinct value */
/* get the dataset size from the table metadata, quicker than a count(*) */
data freqout (keep=col_name);
set sashelp.vtable (rename=(nobs=col_name) where=(libname='WORK' and memname='TEMP1'))
10-07-2014 11:28 AM
Thanks for your response and suggestions.
I have already tried proc sort with nodupkey and proc sql.but count are not matching with one of my report.
Due to which I can't use it.
10-08-2014 02:44 AM
Then your report is obviously broken.
10-08-2014 09:39 AM
How was your report calculated?
You never mentioned if you were running on a server or desktop?
10-08-2014 09:51 AM
Ha! Just (right this moment) had one of these over here.
Another "why I don't use SQL in SAS, unless absolutely necessary".
The proc SQL in question took 46 minutes real time, while the data step solution took 6 seconds(!!!). And the SQL dropped 1 observation one of the counts on top of that.
10-07-2014 10:40 AM
Give some sample data and the output to explain your question more .
and one more important thing is if your dataset have been sorted ?
10-08-2014 05:00 AM
I tested with a 21M observations faked dataset on UE.
There is no problem with that. If you have a lot of different values all what is counted needs to be in memory.
Sorting/ordering on the variable and than doing count with by processing will limit memory usage.
27M is still small unless you have big/long variables.
What are you system limits resources versions and environment?
44 data carsm (drop=i);
45 set sashelp.cars;
46 do i=1 to 50000; output;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.CARSM has 21400000 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 24.44 seconds
cpu time 9.43 seconds
49 options fullstimer;
50 proc freq data=carsm ;
51 table make / noprint norow nocol out=cart ;
NOTE: There were 21400000 observations read from the data set WORK.CARSM.
NOTE: The data set WORK.CART has 38 observations and 3 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 3.02 seconds
user cpu time 1.99 seconds
system cpu time 0.48 seconds
OS Memory 23188.00k
Timestamp 08-10-2014 05:02:12 AM
Step Count 6 Switch Count 34
Page Faults 3
Page Reclaims 242
Page Swaps 0
Voluntary Context Switches 94
Involuntary Context Switches 61
Block Input Operations 672
Block Output Operations 280