I'm running a very simple code on a 4GB dataset, but it's taken over 24 hours and the output file is still at 1 KB, and SAS is still giving me the hour-glass. The log doesn't show any error -- it just doesn't output anything while hour-glassing.
Does anyone have a hunch as to what's going on?
proc sql;
create table temp.count as
select count (distinct enrolid) as ndistinct, count (enrolid) as n
from have;
quit;
proc sql;
create table temp.count1 as
select count (distinct enrolid) as ndistinct from have;
create table temp.count2 as
select n(enrolid) as n from have;
quit;
Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .
data class;
set sashelp.class;
if _n_=10 then call missing(name);
if _n_ in (1 2) then name='Judy';
run;
data _null_;
if _n_=1 then do;
if 0 then set class ;
declare hash h();
h.definekey('name');
h.definedone();
end;
set class end=last;
if not missing(name) then do;n+1; h.ref();end;
if last then do;
ndistinct=h.num_items;
putlog 'ndistinct=' ndistinct ' n=' n;
end;
run;
I'm not sure but the following is what outputted after I used "proc options group=memory; run;" <-- if this is the correct way to check.
Is there anyway to make the above query simpler? I guess I could get rid of "count (enrolid)" b/c that's just the number of observations.
Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable
programs loaded by SAS.
MEMSIZE=2147483648
Specifies the limit on the amount of virtual memory that can be used during
a SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate
Sorting a 4Gbyte file can take a long time.
Is the dataset already sorted by ENROLID? If so you might get better results using code like this.
data temp.count ;
do until(eof);
set have(keep=enrolid) end=eof ;
by enrolid ;
where not missing(enrolid);
ndistinct + first.enrolid ;
n+1;
end;
keep ndistinct n ;
run;
How long does a PROC FREQ take?
That can be used to get the same information.
https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas
proc sql;
create table temp.count1 as
select count (distinct enrolid) as ndistinct from have;
create table temp.count2 as
select n(enrolid) as n from have;
quit;
Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .
data class;
set sashelp.class;
if _n_=10 then call missing(name);
if _n_ in (1 2) then name='Judy';
run;
data _null_;
if _n_=1 then do;
if 0 then set class ;
declare hash h();
h.definekey('name');
h.definedone();
end;
set class end=last;
if not missing(name) then do;n+1; h.ref();end;
if last then do;
ndistinct=h.num_items;
putlog 'ndistinct=' ndistinct ' n=' n;
end;
run;
SQL is prone to such things (bad performance when doing implicit sorting of large datasets).
Try the following:
proc sort
data=have (keep=enrolid)
out=temp
;
by enrolid;
run;
data temp.count (keep=ndistinct n);
set temp end=done;
by enrolid;
retain
ndistinct 0
n 0
;
n + 1;
if first.enrolid then ndistinct + 1;
if done;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.