BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cdubs
Quartz | Level 8

I'm running a very simple code on a 4GB dataset, but it's taken over 24 hours and the output file is still at 1 KB, and SAS is still giving me the hour-glass. The log doesn't show any error -- it just doesn't output anything while hour-glassing.

 

Does anyone have a hunch as to what's going on?

 

proc sql;

create table temp.count as
select count (distinct enrolid) as ndistinct, count (enrolid) as n
from have; quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
proc sql;

create table temp.count1 as
select count (distinct enrolid) as ndistinct  from have;

create table temp.count2 as
select  n(enrolid) as n from have;

quit;

Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .

data class;
 set sashelp.class;
 if _n_=10 then call missing(name);
 if _n_  in (1 2) then name='Judy';
run;

data _null_;
if _n_=1 then do;
 if 0 then set 	class ;
 declare hash h();
 h.definekey('name');
 h.definedone();
end;
 set class end=last;
 if not missing(name) then do;n+1; h.ref();end;

 if last then do;
   ndistinct=h.num_items;
   putlog 'ndistinct=' ndistinct '  n=' n;
 end;
run;

 

View solution in original post

6 REPLIES 6
LinusH
Tourmaline | Level 20
Well your query is somewhat resource consuming, but I suspect that you have limited memory resources available for your SAS session. Check your MEMSIZE and SORTSIZE settings. Also, look in the OS for metrics connected your SAS session vs available resources.
Data never sleeps
cdubs
Quartz | Level 8

I'm not sure but the following is what outputted after I used "proc options group=memory; run;" <-- if this is the correct way to check.

 

Is there anyway to make the above query simpler? I guess I could get rid of "count (enrolid)" b/c that's just the number of observations.

 

Group=MEMORY
 SORTSIZE=1073741824
                   Specifies the amount of memory that is available to the SORT procedure.
 SUMSIZE=0         Specifies a limit on the amount of memory that is available for data
                   summarization procedures when class variables are active.
 MAXMEMQUERY=0     Specifies the maximum amount of memory that is allocated for procedures.
 MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
 MEMMAXSZ=2147483648
                   Specifies the maximum amount of memory to allocate for using memory-based
                   libraries.
 LOADMEMSIZE=0     Specifies a suggested amount of memory that is needed for executable
                   programs loaded by SAS.
 MEMSIZE=2147483648
                   Specifies the limit on the amount of virtual memory that can be used during
                   a SAS session.
 REALMEMSIZE=0     Specifies the amount of real memory SAS can expect to allocate

 

Tom
Super User Tom
Super User

Sorting a 4Gbyte file can take a long time.

Is the dataset already sorted by ENROLID?  If so you might get better results using code like this.

data temp.count ;
 do until(eof);
   set have(keep=enrolid) end=eof ;
   by enrolid ;
   where not missing(enrolid);
   ndistinct + first.enrolid ;
   n+1;
 end;
 keep ndistinct n ;
run;

 

Reeza
Super User

How long does a PROC FREQ take?

That can be used to get the same information. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

 

 

Ksharp
Super User
proc sql;

create table temp.count1 as
select count (distinct enrolid) as ndistinct  from have;

create table temp.count2 as
select  n(enrolid) as n from have;

quit;

Or could use Hash Table and Data step to save some time. Here is an example for sashelp.class .

data class;
 set sashelp.class;
 if _n_=10 then call missing(name);
 if _n_  in (1 2) then name='Judy';
run;

data _null_;
if _n_=1 then do;
 if 0 then set 	class ;
 declare hash h();
 h.definekey('name');
 h.definedone();
end;
 set class end=last;
 if not missing(name) then do;n+1; h.ref();end;

 if last then do;
   ndistinct=h.num_items;
   putlog 'ndistinct=' ndistinct '  n=' n;
 end;
run;

 

Kurt_Bremser
Super User

SQL is prone to such things (bad performance when doing implicit sorting of large datasets).

Try the following:

proc sort
  data=have (keep=enrolid)
  out=temp
;
by enrolid;
run;

data temp.count (keep=ndistinct n);
set temp end=done;
by enrolid;
retain
  ndistinct 0
  n 0
;
n + 1;
if first.enrolid then ndistinct + 1;
if done;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3003 views
  • 4 likes
  • 6 in conversation