Hi, Guys
I'm very confused about the value MIN and MAX of SORTSIZE system option. Here is from SAS Help and Document.
Syntax
-SORTSIZE n | nK | nM | nG | hexX | MIN | MAX
SORTSIZE= n | nK | nM | nG | hexX | MIN | MAX
Required Arguments
n | nK | nM | nG specifies the amount of memory in multiples of 1; 1,024 (kilobytes); 1,048,576 (megabytes); and 1,073,741,824 (gigabytes) respectively. You can specify decimal values for the number of kilobytes, megabytes, or gigabytes. For example, a value of 8 specifies 8 bytes, a value of .782k specifies 801 bytes, and a value of 3m specifies 3,145,728 bytes.
hexX specifies the amount of memory as a hexadecimal value. You must specify the value beginning with a number (0–9), followed by an X. For example, the value 2dx sets the amount of memory to 45 bytes.
MIN specifies the minimum amount of memory available.
MAX specifies the maximum amount of memory available.
I've ran some test. If I set SORTSIZE=MIN, the actual value is 0. So I ran a SORT procedure with a large data, too see whether the SAS process would use memory to sort data. And it actually used memory, it's just like when you set SORTSIZE=MAX.
So, I became very confused. What's the difference between MIN and MAX?
/* sample data*/
data test;
length id $ 500;
set sashelp.cars;
do i=1 to 5000;
id=put(uniform(i)*1000+_n_,z6.);
output;
end;
run;
option fullstimer;
/* Set SORTSIZE=MIN*/
option sortsize=MIN;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test1;
by id;
run;
/* Set SORTSIZE=MAX*/
option sortsize=MAX;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test2;
by id;
run;
/*Set SORTSIZE=300M*/
option sortsize=300M;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test3;
by id;
run;
Here is the information of Log, the memory used when SORTSIZE=MIN is just same as when SORTSIZE=MAX.
43
44
45 %put %sysfunc(getoption(memsize));
2147483648
46
47 /* sample data*/
48 data test;
49 length id $ 500;
50 set sashelp.cars;
51 do i=1 to 5000;
52 id=put(uniform(i)*1000+_n_,z6.);
53 output;
54 end;
55 run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.TEST has 2140000 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 3.18 seconds
user cpu time 1.09 seconds
system cpu time 0.45 seconds
memory 511.43k
OS Memory 10468.00k
Timestamp 06/28/2017 10:43:37 PM
Step Count 5 Switch Count 0
56
57 option fullstimer;
58
59 /* Set SORTSIZE=MIN*/
60 option sortsize=MIN;
61 /* Get the value of SORTSIZE*/
62 %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: 0
63 /* Sort the data*/
64 proc sort data=test out=test1;
65 by id;
66 run;
NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST1 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 18.01 seconds
user cpu time 5.36 seconds
system cpu time 4.68 seconds
memory 1686892.54k
OS Memory 1696752.00k
Timestamp 06/28/2017 10:43:55 PM
Step Count 6 Switch Count 0
67
68 /* Set SORTSIZE=MAX*/
69 option sortsize=MAX;
70 /* Get the value of SORTSIZE*/
71 %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: MAX
72 /* Sort the data*/
73 proc sort data=test out=test2;
74 by id;
75 run;
NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST2 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 12.06 seconds
user cpu time 4.82 seconds
system cpu time 3.51 seconds
memory 1686892.54k
OS Memory 1696752.00k
Timestamp 06/28/2017 10:44:07 PM
Step Count 7 Switch Count 0
76
77 /*Set SORTSIZE=300M*/
78 option sortsize=300M;
79 /* Get the value of SORTSIZE*/
80 %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: 314572800
81 /* Sort the data*/
82 proc sort data=test out=test3;
83 by id;
84 run;
NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST3 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 29.62 seconds
user cpu time 5.39 seconds
system cpu time 3.79 seconds
memory 312110.68k
OS Memory 321204.00k
Timestamp 06/28/2017 10:44:37 PM
Step Count 8 Switch Count 0
And, If the number of SORTSIZE is greater than the number of MEMSIZE, Is it will silently reduced to the number of MEMSIZE?(Just to confirm, from the test, it is true.)
1- Agreed, value MIN seems of no practical value
2- Is it silently reduced to the number of MEMSIZE?
Yes it is as shown in the diagram. Apologies I read your question backwards.
Note that REALMEMSIZE is the value that matters.
> If the number of SORTSIZE is greater than the number of MEMSIZE, Is it silently reduced to the number of MEMSIZE?
No it isn't. And your test is invalid: Your MEMSIZE is 2GB and you set SORTSIZE to 300MB.
Excerpt for the book linked in my signature, which includes a whole chapter on proc sort and its parameters:
- The value for SORTSIZE should not be greater than the amount of available memory. If SAS were to perform an internal sort while the OS is paging its memory calls, the performance would be much worse than doing an external sort.
About memory and still from the book: Note that MEMSIZE is *not* the amount of available memory.
You can get this amount with
%put Free RAM = %sysfunc(putn(%sysfunc(getoption(xmrlmem))/1024**2,comma10.))MB;
See diagram below:
Thanks, Chris! That's very helpful. I will buy the book to study.
About the value of MIN and MAX, I just want to know the difference. Because from the test result, they seems no difference.
And the other question:
> If the number of SORTSIZE is greater than the number of MEMSIZE, Is it silently reduced to the number of MEMSIZE?
It's not? Because I ran a test, the log showed that it's true. I forgot paste the code and log yesterday. I set the MEMSIZE to 256M, SORTSIZE to 1G in sasv9.cfg, the configuration file. The SORT procedure can only use memory below 256M, even the SORTSIZE is 1G.
17 %put MEMSIZE = %sysfunc(putn(%sysfunc(getoption(memsize))/1024,comma10.))Kb;
MEMSIZE = 262,144Kb
18 %put SORTSIZE = %sysfunc(putn(%sysfunc(getoption(sortsize))/1024,comma10.))Kb;
SORTSIZE = 1,048,576Kb
19
20 option fullstimer msglevel=I;
21 proc sort data=test out=test3 details;
22 by id;
23 run;
NOTE: Utility file required.
NOTE: Utility file 1 page size is 262144 bytes.
NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: Utility file 1 contains 2140000 records and 13 sorted runs.
NOTE: Utility file 1 contains 5491 pages for a total of 1405696.00 KB.
NOTE: SAS threaded sort was used.
NOTE: The data set WORK.TEST3 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.87 seconds
user cpu time 4.34 seconds
system cpu time 2.76 seconds
memory 206253.09k
OS Memory 215364.00k
Timestamp 06/29/2017 08:30:43 PM
Step Count 3 Switch Count 4
Just like the diagram you give, SORTSIZE is part of the MEMSIZE. Even if you set the number of SORTSIZE greater than MEMSIZE, the max amount memory of the SORT procedure can use is depend on the MEMSIZE. Is that true?
1- Agreed, value MIN seems of no practical value
2- Is it silently reduced to the number of MEMSIZE?
Yes it is as shown in the diagram. Apologies I read your question backwards.
Note that REALMEMSIZE is the value that matters.
Thanks a lot!
About the MIN value, what would you ever use this?
If you are simply interested in proc sort's internals, you may be interested in the explanations in the same chapter. I won't post it here as it's too long, but you can see the different phases yourself by running the code below.
Excerpt:
As an exercise, we sort a table and gradually reduce the amount of memory available to PROC SORT in order to degrade its performance. As PROC SORT goes through different algorithms to accommodate its execution environment, it gives information about its execution choices.
Here is the code:
option fullstimer msglevel=I cpucount=2 compress=no;
data TMP; do I=1 to 1e6; output; end; run;
*Test1; proc sort data=TMP out=TMP1 sortsize=90000k details; by I; run;
*Test2; proc sort data=TMP out=TMP1 sortsize= 9000k details; by I; run;
*Test3; proc sort data=TMP out=TMP1 sortsize= 2300k details; by I; run;
*Test4; proc sort data=TMP out=TMP1 sortsize= 900k details; by I; run;
*Test5; proc sort data=TMP out=TMP1 sortsize= 90k details; by I; run;
*Test6; proc sort data=TMP(obs=99) sortsize=90000k details; by I; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.