Hi,
I am facing issue with proc freq .request you to please help me out.
1.while running Proc freq on 57millions records.Its throwing an error of memory issue. I am not creating output in HTML / ODS. output is created in sasdata set and only generating nlevels (distinct value)Note that I can't use proc sql as there is some issue with distinct cout. How to mitigate memory issue.
2.Can I stop count and cummulative percent calculation.Only I need distinct value?
Thanks
Post the code you're using so far.
Hi,
Please find the code which I am using to pull the nlevels of records. variable are both in character and numeric.
proc freq data=table_name (keep=col_name) noprint nlevels;
table col_name/out=freqout(keep=col_name) norow nocol ;
run;
I am using norow nocol to avoid count and cum percent getting calculated but it's not helping me out.
You have a couple of syntax issues:
the output dataset for a single variable would have count and percent it would not have levels. NOROW and NOCOL suppress percentages of a cross tab, so they are doing nothing with your data, use NOCOUNT and NOPERCENT.
Levels information is displayed in listing or ods output. If you want the result in a dataset then you use ODS to get that.
Drop the nlevels and see if that works.
Then run a second proc freq on your output data set:
proc freq data=freqout;
tables count;
run;
the total CUMULATIVE frequency will be how many original unique values were in the data set.
I suspect you may have had similar syntax errors in your SQL approach.
Proc sql;
select count(*)
from (select distinct col_name from table_name);
quit;
The output wouldn't have count or percent, only the distinct values and the Nlevels won't do anything besides take up processing time, removing it does speed it up significantly as well.
Depends on what the OP wants in the final results.
Hi ,
I have used the same syntax for proc sql but due to precision the output are different in proc freq vs proc sql distinct count.I can't use proc sql.
I understand that NOROW and NOCOL is not going to help me.I was just trying to stop calcualtion happening in cumalative and count.
I don't think NOCOUNT options are there in proc freq.
Thanks for your suggestion using ODS.
57 million records is a lot and may take a while but not run out of memory. Have you tried the memsize option? Are you running on a server or a desktop?
Can you sort the data set without running out of memory?
proc sort data=table_name (keep=col_name) out=temp1 nodupkey;
by col_name;
run;
/* work.temp1 has one line per distinct value */
/* get the dataset size from the table metadata, quicker than a count(*) */
data freqout (keep=col_name);
set sashelp.vtable (rename=(nobs=col_name) where=(libname='WORK' and memname='TEMP1'))
run;
Hi Kurt,
Thanks for your response and suggestions.
I have already tried proc sort with nodupkey and proc sql.but count are not matching with one of my report.
Due to which I can't use it.
Thanks
How was your report calculated?
You never mentioned if you were running on a server or desktop?
Ha! Just (right this moment) had one of these over here.
Another "why I don't use SQL in SAS, unless absolutely necessary".
The proc SQL in question took 46 minutes real time, while the data step solution took 6 seconds(!!!). And the SQL dropped 1 observation one of the counts on top of that.
Hi Reeza,
Apologize for not mentioning it.I am running it on server.
Thankss
Give some sample data and the output to explain your question more .
and one more important thing is if your dataset have been sorted ?
Xia Keshan
I tested with a 21M observations faked dataset on UE.
There is no problem with that. If you have a lot of different values all what is counted needs to be in memory.
Sorting/ordering on the variable and than doing count with by processing will limit memory usage.
27M is still small unless you have big/long variables.
What are you system limits resources versions and environment?
44 data carsm (drop=i);
45 set sashelp.cars;
46 do i=1 to 50000; output;
47 end;
48 run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.CARSM has 21400000 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 24.44 seconds
cpu time 9.43 seconds
49 options fullstimer;
50 proc freq data=carsm ;
51 table make / noprint norow nocol out=cart ;
52 run;
NOTE: There were 21400000 observations read from the data set WORK.CARSM.
NOTE: The data set WORK.CART has 38 observations and 3 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 3.02 seconds
user cpu time 1.99 seconds
system cpu time 0.48 seconds
memory 677.43k
OS Memory 23188.00k
Timestamp 08-10-2014 05:02:12 AM
Step Count 6 Switch Count 34
Page Faults 3
Page Reclaims 242
Page Swaps 0
Voluntary Context Switches 94
Involuntary Context Switches 61
Block Input Operations 672
Block Output Operations 280
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.