DATA Step, Macro, Functions and more

Simple proc sql count function taking >12 hours...

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Simple proc sql count function taking >12 hours...

[ Edited ]

I'm running a very simple code on a 4GB dataset, but it's taken over 24 hours and the output file is still at 1 KB, and SAS is still giving me the hour-glass. The log doesn't show any error -- it just doesn't output anything while hour-glassing.

 

Does anyone have a hunch as to what's going on?

 

proc sql;

create table temp.count as
select count (distinct enrolid) as ndistinct, count (enrolid) as n
from have; quit;

Accepted Solutions
Solution
‎01-01-2018 08:31 AM
Super User
Posts: 10,604

Re: Simple proc sql count function taking >12 hours...

[ Edited ]
proc sql;

create table temp.count1 as
select count (distinct enrolid) as ndistinct  from have;

create table temp.count2 as
select  n(enrolid) as n from have;

quit;

Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .

data class;
 set sashelp.class;
 if _n_=10 then call missing(name);
 if _n_  in (1 2) then name='Judy';
run;

data _null_;
if _n_=1 then do;
 if 0 then set 	class ;
 declare hash h();
 h.definekey('name');
 h.definedone();
end;
 set class end=last;
 if not missing(name) then do;n+1; h.ref();end;

 if last then do;
   ndistinct=h.num_items;
   putlog 'ndistinct=' ndistinct '  n=' n;
 end;
run;

 

View solution in original post


All Replies
Super User
Posts: 5,824

Re: Simple proc sql count function taking >12 hours...

Well your query is somewhat resource consuming, but I suspect that you have limited memory resources available for your SAS session. Check your MEMSIZE and SORTSIZE settings. Also, look in the OS for metrics connected your SAS session vs available resources.
Data never sleeps
Contributor
Posts: 44

Re: Simple proc sql count function taking >12 hours...

[ Edited ]

I'm not sure but the following is what outputted after I used "proc options group=memory; run;" <-- if this is the correct way to check.

 

Is there anyway to make the above query simpler? I guess I could get rid of "count (enrolid)" b/c that's just the number of observations.

 

Group=MEMORY
 SORTSIZE=1073741824
                   Specifies the amount of memory that is available to the SORT procedure.
 SUMSIZE=0         Specifies a limit on the amount of memory that is available for data
                   summarization procedures when class variables are active.
 MAXMEMQUERY=0     Specifies the maximum amount of memory that is allocated for procedures.
 MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
 MEMMAXSZ=2147483648
                   Specifies the maximum amount of memory to allocate for using memory-based
                   libraries.
 LOADMEMSIZE=0     Specifies a suggested amount of memory that is needed for executable
                   programs loaded by SAS.
 MEMSIZE=2147483648
                   Specifies the limit on the amount of virtual memory that can be used during
                   a SAS session.
 REALMEMSIZE=0     Specifies the amount of real memory SAS can expect to allocate

 

Super User
Super User
Posts: 7,844

Re: Simple proc sql count function taking >12 hours...

Sorting a 4Gbyte file can take a long time.

Is the dataset already sorted by ENROLID?  If so you might get better results using code like this.

data temp.count ;
 do until(eof);
   set have(keep=enrolid) end=eof ;
   by enrolid ;
   where not missing(enrolid);
   ndistinct + first.enrolid ;
   n+1;
 end;
 keep ndistinct n ;
run;

 

Super User
Posts: 22,818

Re: Simple proc sql count function taking >12 hours...

How long does a PROC FREQ take?

That can be used to get the same information. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

 

 

Solution
‎01-01-2018 08:31 AM
Super User
Posts: 10,604

Re: Simple proc sql count function taking >12 hours...

[ Edited ]
proc sql;

create table temp.count1 as
select count (distinct enrolid) as ndistinct  from have;

create table temp.count2 as
select  n(enrolid) as n from have;

quit;

Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .

data class;
 set sashelp.class;
 if _n_=10 then call missing(name);
 if _n_  in (1 2) then name='Judy';
run;

data _null_;
if _n_=1 then do;
 if 0 then set 	class ;
 declare hash h();
 h.definekey('name');
 h.definedone();
end;
 set class end=last;
 if not missing(name) then do;n+1; h.ref();end;

 if last then do;
   ndistinct=h.num_items;
   putlog 'ndistinct=' ndistinct '  n=' n;
 end;
run;

 

Super User
Posts: 9,547

Re: Simple proc sql count function taking >12 hours...

SQL is prone to such things (bad performance when doing implicit sorting of large datasets).

Try the following:

proc sort
  data=have (keep=enrolid)
  out=temp
;
by enrolid;
run;

data temp.count (keep=ndistinct n);
set temp end=done;
by enrolid;
retain
  ndistinct 0
  n 0
;
n + 1;
if first.enrolid then ndistinct + 1;
if done;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 307 views
  • 4 likes
  • 6 in conversation