BookmarkSubscribeRSS Feed
0 Likes

Hi, SAS developers, I checked whether this issue has been reported or not but did not find the related posts. Check the following SAS log ran with SAS 9.4: PROC SQL used much more time than PROC SORT on the same data. It seems that PROC SQL did not use the threading and parallel CPU power when counting distinct rows. Hope this can be improved with PROC SQL. Thanks!

 

82 proc sql noprint;
83 select count(unique(Pat_ID)) into : Count_1 from UnsortExtraVar;
84 %put Count_1=&Count_1;
Count_1=38879956
85 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 6:37.73
cpu time 6:37.78

86
87
88 proc sort data= UnsortExtraVar out=SortedExtraVar nodupkey;
89 by Pat_ID;
90 run;

NOTE: There were 440979767 observations read from the data set WORK.UNSORTEXTRAVAR.
NOTE: 402099811 observations with duplicate key values were deleted.
NOTE: The data set WORK.SORTEDEXTRAVAR has 38879956 observations and 8 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 2:11.29
cpu time 4:38.69

91 proc sql noprint;
92 select count(unique(Pat_ID)) into : Count_2 from SortedExtraVar;
93 %put Count_2=&Count_2;
Count_2=38879956
94 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 55.10 seconds
cpu time 55.11 seconds