In general, if a large table is reduced a little by COMPRESS, the PROC SORT applied to it does not improve much, even with the TAGSORT or PRESORTED options. At this point what should be done to reduce the processing time of a proc sort?
this is the result respectively without COMPRESS and with COMPRESS applied to table tab1
proc sort data = tab1
out = tab_out(keep=cod_source cod_compagnia
cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita
imp_riserva
cod_divisa dt_carico cod_tipo_agg_riserva) TAGSORT;
by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva
ts_fine_validita;
run;
NOTE: Tagsort reads each observation of the input data set twice.
NOTE: The data set tab_out has 301087776 observations and 13 variables.
NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):
real time 7:19.48
user cpu time 4:35.45
system cpu time 44.12 seconds
memory 20087284.06k
OS Memory 20111168.00k
Timestamp 21/04/2023 02:51:28 p.
Step Count 17 Switch Count 693
proc sort data = tab1 out = tab_out(keep=cod_source cod_compagnia
cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita
imp_riserva cod_divisa dt_carico cod_tipo_agg_riserva);
by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva
ts_fine_validita;
run;
NOTE: There were 301087776 observations read from the data set tab1.
NOTE: The data set tab_out has 301087776 observations and 13 variables.
NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):
real time 8:14.81
user cpu time 2:47.42
system cpu time 1:03.64
memory 31707434.82k
OS Memory 31729708.00k
Timestamp 21/04/2023 03:03:25 p.
Step Count 19 Switch Count 613
Just from mechanics perspective,
The smaller the record length --> the more records can fit into the buffer (Memory) --> the less times data needs to fetched from Disk into Memory --> the faster processing is finished 😉
So, as you can see, it all starts "Smaller Record Length". Here is a good SGF paper that gives you strategies/techniques/methods to reduce the Record length beside Compression. Twenty Ways to Run Your SAS® Program Faster and Use Less Space
I personally follow these points in my code
5. Use the LENGTH command to define the length of character and numeric variables. This can achieve a significant reduction in the space used by the program.
6. Numeric variables in SAS data sets have a default length of 8. If the values of the numeric variable are all integers, you can reduce the space by using the following table.
Length in Bytes |
Largest Integer Represented Exactly |
Exponential Notation |
Significant Digits Retained |
---|---|---|---|
3 |
8,192 |
213 |
3 |
4 |
2,097,152 |
221 |
6 |
5 |
536,870,912 |
229 |
8 |
6 |
137,438,953,472 |
237 |
11 |
7 |
35,184,372,088,832 |
245 |
13 |
8 |
9,007,199,254,740,992 |
253 |
15 |
The third column refers to the absolute value of the number. Calculate the largest value of the numeric variable, check to make sure all values are integers by comparing the variable’s value to the value calculated with the ROUND function, and then, if the variables are all integers, use the table below to determine the smallest length required. More details can be found in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostwin/n04ccixfia6l2pn1f8szvttqg3hm.htm
7. Sometimes character variables imported into SAS from other systems, like Oracle or Excel, have very large lengths. You can use the following procedure to get the shortest possible length for your character variable, although you might want to allow room for growth:
a. Use the LENGTH function to calculate the actual length of the variable in each observation in the data set.
b. Use the MAX option in PROC SUMMARY to get the largest value of the length.
c. Use the LENGTH statement to shorten the length of the character variable to the maximum length.
8. Switch variables from numeric to character if they are integers and range in value from -9 to 99. The minimum length for numeric variables is 3, so you can save space if the variable can fit into one or two characters.
9. Switch variables from character to numeric if they are all integers and occupy more than 3 bytes.
For example, the number 1234 would occupy 4 bytes as a character variable but item 6 above shows it would only occupy 3 bytes as a numeric variable.
Hope this helps
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.