I don't quite understand why a proc sort applied to a large table worsens performance if applied to the same dataset, but compressed. The table in question weighs about 29Gb (305819687 observations and 16 variables, of which 13 variables used in the "by"), and by doing the COMPRESS the weight is reduced to only 26Gb. A first doubt I have is that usually compress reduces the occupied memory much more. I wish I could optimize this proc sort but I didn't get good results, neither with the use of compress for the table, nor with the Bufsize= and Bufno = options (I tried them for all possible combinations), nor with the TAGSORT option . With the SORTSIZE option I think the situation does not change because this is set to 32Gb, therefore a value already higher than the table size. Here is the outcome of the log of the starting sort:
NOTE: There were 305819687 observations read from the data set ODS_VT.T_F_PTFVT_RISERVA_AWARDS. NOTE: The data set WORK.ETLS_SORTEDXREF has 305819687 observations and 13 variables. NOTES: PROCEDURE SORT used (total processing time): real-time 8:29.88 user cpu time 2:35.53 system cpu time 53.34 seconds memory 31708967.57k OS Memory 31729708.00k Timestamp 04/05/2023 08:18:18 m. Step Count 15 Switch Count 659 and here the result of the log after the application of the proc sort on the compressed table with the use of the TAGSORT option:
NOTE: Tagsort reads each observation of the input data set twice. NOTE: The data set WORK.ETLS_SORTEDXREF has 305819687 observations and 13 variables. NOTES: Compressing data set WORK.ETLS_SORTEDXREF decreased size by 7.66 percent. Compressed is 311022 pages; uncompressed would require 336806 pages. NOTES: PROCEDURE SORT used (total processing time): real time 12:35.03 user cpu time 6:48.25 system cpu time 1:22.71 memory 20403259.93k OS Memory 20426884.00k Timestamp 05/04/2023 08:35:33m. Step Count 19 Switch Count 1205
... View more