I'm running a very simple code on a 4GB dataset, but it's taken over 24 hours and the output file is still at 1 KB, and SAS is still giving me the hour-glass. The log doesn't show any error -- it just doesn't output anything while hour-glassing.
Does anyone have a hunch as to what's going on?
proc sql;
create table temp.count as
select count (distinct enrolid) as ndistinct, count (enrolid) as n
from have;
quit;
proc sql;
create table temp.count1 as
select count (distinct enrolid) as ndistinct from have;
create table temp.count2 as
select n(enrolid) as n from have;
quit;
Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .
data class;
set sashelp.class;
if _n_=10 then call missing(name);
if _n_ in (1 2) then name='Judy';
run;
data _null_;
if _n_=1 then do;
if 0 then set class ;
declare hash h();
h.definekey('name');
h.definedone();
end;
set class end=last;
if not missing(name) then do;n+1; h.ref();end;
if last then do;
ndistinct=h.num_items;
putlog 'ndistinct=' ndistinct ' n=' n;
end;
run;
I'm not sure but the following is what outputted after I used "proc options group=memory; run;" <-- if this is the correct way to check.
Is there anyway to make the above query simpler? I guess I could get rid of "count (enrolid)" b/c that's just the number of observations.
Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable
programs loaded by SAS.
MEMSIZE=2147483648
Specifies the limit on the amount of virtual memory that can be used during
a SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate
Sorting a 4Gbyte file can take a long time.
Is the dataset already sorted by ENROLID? If so you might get better results using code like this.
data temp.count ;
do until(eof);
set have(keep=enrolid) end=eof ;
by enrolid ;
where not missing(enrolid);
ndistinct + first.enrolid ;
n+1;
end;
keep ndistinct n ;
run;
How long does a PROC FREQ take?
That can be used to get the same information.
https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas
proc sql;
create table temp.count1 as
select count (distinct enrolid) as ndistinct from have;
create table temp.count2 as
select n(enrolid) as n from have;
quit;
Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .
data class;
set sashelp.class;
if _n_=10 then call missing(name);
if _n_ in (1 2) then name='Judy';
run;
data _null_;
if _n_=1 then do;
if 0 then set class ;
declare hash h();
h.definekey('name');
h.definedone();
end;
set class end=last;
if not missing(name) then do;n+1; h.ref();end;
if last then do;
ndistinct=h.num_items;
putlog 'ndistinct=' ndistinct ' n=' n;
end;
run;
SQL is prone to such things (bad performance when doing implicit sorting of large datasets).
Try the following:
proc sort
data=have (keep=enrolid)
out=temp
;
by enrolid;
run;
data temp.count (keep=ndistinct n);
set temp end=done;
by enrolid;
retain
ndistinct 0
n 0
;
n + 1;
if first.enrolid then ndistinct + 1;
if done;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.