BookmarkSubscribeRSS Feed
mariopellegrini
Pyrite | Level 9

In general, if a large table is reduced a little by COMPRESS, the PROC SORT applied to it does not improve much, even with the TAGSORT or PRESORTED options. At this point what should be done to reduce the processing time of a proc sort?

 

this is the result respectively without COMPRESS and with COMPRESS applied to table tab1

 


proc sort data = tab1
out = tab_out(keep=cod_source cod_compagnia
cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita
imp_riserva
cod_divisa dt_carico cod_tipo_agg_riserva) TAGSORT;
by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva
ts_fine_validita;
run;

NOTE: Tagsort reads each observation of the input data set twice.
NOTE: The data set tab_out has 301087776 observations and 13 variables.
NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):
real time 7:19.48
user cpu time 4:35.45
system cpu time 44.12 seconds
memory 20087284.06k
OS Memory 20111168.00k
Timestamp 21/04/2023 02:51:28 p.
Step Count 17 Switch Count 693

 

proc sort data = tab1 out = tab_out(keep=cod_source cod_compagnia
cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita
imp_riserva cod_divisa dt_carico cod_tipo_agg_riserva);
by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva
ts_fine_validita;
run;

NOTE: There were 301087776 observations read from the data set tab1.
NOTE: The data set tab_out has 301087776 observations and 13 variables.
NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):
real time 8:14.81
user cpu time 2:47.42
system cpu time 1:03.64
memory 31707434.82k
OS Memory 31729708.00k
Timestamp 21/04/2023 03:03:25 p.
Step Count 19 Switch Count 613

 

 

 

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

follow this guide

________________________

- Cheers -

Astounding
PROC Star
Two items stick out like a sore thumb. First, your program sorts all the variables, then outputs the KEEP= subset. Apply KEEP= to the incoming data rather than the output. Second, TAGSORT doesn't help. If anything, it increases the time required. You can see the note on the log telling you that SAS had to process the data twice. That's because of TAGSORT. As a general rule, eliminate TAGSORT unless you are sorting so many variables that you can't find the memory that PROC SORT needs in order to complete.

Added: notice that the link provided by the previous poster agrees that TAGSORT often drastically increases the time that PROC SORT takes.
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @mariopellegrini 

Just from mechanics perspective,

The smaller the record length --> the more records can fit into the buffer (Memory) --> the less times data needs to fetched from Disk into Memory --> the faster processing is finished 😉

 

So, as you can see, it all starts "Smaller Record Length". Here is a good SGF paper that gives you strategies/techniques/methods to reduce the Record length beside Compression.  Twenty Ways to Run Your SAS® Program Faster and Use Less Space

 

I personally follow these points in my code

 

5. Use the LENGTH command to define the length of character and numeric variables. This can achieve a significant reduction in the space used by the program.

 

6. Numeric variables in SAS data sets have a default length of 8. If the values of the numeric variable are all integers, you can reduce the space by using the following table.

 

Length in Bytes

Largest Integer Represented Exactly

Exponential Notation

Significant Digits Retained

3

8,192

213

3

4

2,097,152

221

6

5

536,870,912

229

8

6

137,438,953,472

237

11

7

35,184,372,088,832

245

13

8

9,007,199,254,740,992

253

15

 

The third column refers to the absolute value of the number. Calculate the largest value of the numeric variable, check to make sure all values are integers by comparing the variable’s value to the value calculated with the ROUND function, and then, if the variables are all integers, use the table below to determine the smallest length required. More details can be found in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostwin/n04ccixfia6l2pn1f8szvttqg3hm.htm

 

7. Sometimes character variables imported into SAS from other systems, like Oracle or Excel, have very large lengths. You can use the following procedure to get the shortest possible length for your character variable, although you might want to allow room for growth:

a. Use the LENGTH function to calculate the actual length of the variable in each observation in the data set.
b. Use the MAX option in PROC SUMMARY to get the largest value of the length.
c. Use the LENGTH statement to shorten the length of the character variable to the maximum length.

 

8. Switch variables from numeric to character if they are integers and range in value from -9 to 99. The minimum length for numeric variables is 3, so you can save space if the variable can fit into one or two characters.

 

9. Switch variables from character to numeric if they are all integers and occupy more than 3 bytes.

For example, the number 1234 would occupy 4 bytes as a character variable but item 6 above shows it would only occupy 3 bytes as a numeric variable.

 

Hope this helps

Kurt_Bremser
Super User
  • Move the KEEP= option to the input dataset
  • Use TAGSORT only if you have a heavily COMPRESSed, large dataset
  • Use COMPRESS=YES on the output dataset if it will contain large character variables which are mostly empty

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 430 views
  • 4 likes
  • 5 in conversation