BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Slash
Quartz | Level 8

Hi, Guys

 

I'm very confused about the value MIN and MAX of SORTSIZE system option. Here is from SAS Help and Document.

 

Syntax
-SORTSIZE n | nK | nM | nG | hexX | MIN | MAX
SORTSIZE= n | nK | nM | nG | hexX | MIN | MAX


Required Arguments

n | nK | nM | nG specifies the amount of memory in multiples of 1; 1,024 (kilobytes); 1,048,576 (megabytes); and 1,073,741,824 (gigabytes) respectively. You can specify decimal values for the number of kilobytes, megabytes, or gigabytes. For example, a value of 8 specifies 8 bytes, a value of .782k specifies 801 bytes, and a value of 3m specifies 3,145,728 bytes.

 

hexX specifies the amount of memory as a hexadecimal value. You must specify the value beginning with a number (0–9), followed by an X. For example, the value 2dx sets the amount of memory to 45 bytes.

 

MIN specifies the minimum amount of memory available.

 

MAX specifies the maximum amount of memory available.

 

I've ran some test. If I set SORTSIZE=MIN, the actual value is 0. So I ran a SORT procedure with a large data, too see whether the SAS process would use memory to sort data. And it actually used memory, it's just like when you set SORTSIZE=MAX.

 

So, I became very confused. What's the difference between MIN and MAX?

 


/* sample data*/
data test;
	length id $ 500;
	set sashelp.cars;
	do i=1 to 5000;
		id=put(uniform(i)*1000+_n_,z6.);
		output;
	end;
run;

option fullstimer;

/* Set SORTSIZE=MIN*/
option sortsize=MIN;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test1;
	by id;
run;

/* Set SORTSIZE=MAX*/
option sortsize=MAX;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test2;
	by id;
run;

/*Set SORTSIZE=300M*/
option sortsize=300M;
/* Get the value of SORTSIZE*/
%put SORTSIZE is: %sysfunc(getoption(sortsize));
/* Sort the data*/
proc sort data=test out=test3;
	by id;
run;

Here is the information of Log, the memory used when SORTSIZE=MIN is just same as when SORTSIZE=MAX.

43
44
45   %put %sysfunc(getoption(memsize));
2147483648
46
47   /* sample data*/
48   data test;
49       length id $ 500;
50       set sashelp.cars;
51       do i=1 to 5000;
52           id=put(uniform(i)*1000+_n_,z6.);
53           output;
54       end;
55   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.TEST has 2140000 observations and 17 variables.
NOTE: DATA statement used (Total process time):
      real time           3.18 seconds
      user cpu time       1.09 seconds
      system cpu time     0.45 seconds
      memory              511.43k
      OS Memory           10468.00k
      Timestamp           06/28/2017 10:43:37 PM
      Step Count                        5  Switch Count  0


56
57   option fullstimer;
58
59   /* Set SORTSIZE=MIN*/
60   option sortsize=MIN;
61   /* Get the value of SORTSIZE*/
62   %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: 0
63   /* Sort the data*/
64   proc sort data=test out=test1;
65       by id;
66   run;

NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST1 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           18.01 seconds
      user cpu time       5.36 seconds
      system cpu time     4.68 seconds
      memory              1686892.54k
      OS Memory           1696752.00k
      Timestamp           06/28/2017 10:43:55 PM
      Step Count                        6  Switch Count  0


67
68   /* Set SORTSIZE=MAX*/
69   option sortsize=MAX;
70   /* Get the value of SORTSIZE*/
71   %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: MAX
72   /* Sort the data*/
73   proc sort data=test out=test2;
74       by id;
75   run;

NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST2 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           12.06 seconds
      user cpu time       4.82 seconds
      system cpu time     3.51 seconds
      memory              1686892.54k
      OS Memory           1696752.00k
      Timestamp           06/28/2017 10:44:07 PM
      Step Count                        7  Switch Count  0


76
77   /*Set SORTSIZE=300M*/
78   option sortsize=300M;
79   /* Get the value of SORTSIZE*/
80   %put SORTSIZE is: %sysfunc(getoption(sortsize));
SORTSIZE is: 314572800
81   /* Sort the data*/
82   proc sort data=test out=test3;
83       by id;
84   run;

NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST3 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           29.62 seconds
      user cpu time       5.39 seconds
      system cpu time     3.79 seconds
      memory              312110.68k
      OS Memory           321204.00k
      Timestamp           06/28/2017 10:44:37 PM
      Step Count                        8  Switch Count  0


And, If the number of SORTSIZE is greater than the number of MEMSIZE, Is it will silently reduced to the number of MEMSIZE?(Just to confirm, from the test, it is true.)

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

1- Agreed, value MIN seems of no practical value

 

2- Is it silently reduced to the number of MEMSIZE?
  Yes it is as shown in the diagram. Apologies I read your question backwards. Smiley Embarassed

  Note that REALMEMSIZE is the value that matters.

 

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

> If the number of SORTSIZE is greater than the number of MEMSIZE, Is it silently reduced to the number of MEMSIZE?

 

No it isn't. And your test is invalid: Your MEMSIZE is 2GB and you set SORTSIZE to 300MB.

 

Excerpt for the book linked in my signature, which includes a whole chapter on proc sort and its parameters:

- The value for SORTSIZE should not be greater than the amount of available memory. If SAS were to perform an internal sort while the OS is paging its memory calls, the performance would be much worse than doing an external sort.

 

About memory and still from the book: Note that MEMSIZE is *not* the amount of available memory.

You can get this amount with

%put Free RAM = %sysfunc(putn(%sysfunc(getoption(xmrlmem))/1024**2,comma10.))MB;

See diagram below:

aaa1.PNG

 

 

 

 

Slash
Quartz | Level 8

Thanks,  Chris! That's very helpful. Smiley Happy I will buy the book to study.

 

About the value of MIN and MAX, I just want to know the difference. Because from the test result, they seems no difference. 

 

And the other question:

> If the number of SORTSIZE is greater than the number of MEMSIZE, Is it silently reduced to the number of MEMSIZE?

 

It's not? Because I ran a test, the log showed that it's true. I forgot paste the code and log yesterday. I set the MEMSIZE to 256M, SORTSIZE to 1G in sasv9.cfg, the configuration file. The SORT procedure can only use memory below 256M, even the SORTSIZE is 1G. 

 

 

17   %put MEMSIZE = %sysfunc(putn(%sysfunc(getoption(memsize))/1024,comma10.))Kb;
MEMSIZE =    262,144Kb
18   %put SORTSIZE = %sysfunc(putn(%sysfunc(getoption(sortsize))/1024,comma10.))Kb;
SORTSIZE =  1,048,576Kb

19
20   option fullstimer msglevel=I;
21   proc sort data=test out=test3 details;
22       by id;
23   run;

NOTE: Utility file required.
NOTE: Utility file 1 page size is 262144 bytes.
NOTE: There were 2140000 observations read from the data set WORK.TEST.
NOTE: Utility file 1 contains 2140000 records and 13 sorted runs.
NOTE: Utility file 1 contains 5491 pages for a total of 1405696.00 KB.
NOTE: SAS threaded sort was used.
NOTE: The data set WORK.TEST3 has 2140000 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           4.87 seconds
      user cpu time       4.34 seconds
      system cpu time     2.76 seconds
      memory              206253.09k
      OS Memory           215364.00k
      Timestamp           06/29/2017 08:30:43 PM
      Step Count                        3  Switch Count  4

 

Just like the diagram you give, SORTSIZE is part of the MEMSIZE. Even if you set the number of SORTSIZE greater than MEMSIZE, the max amount memory of the SORT procedure can use is depend on the MEMSIZE. Is that true?

ChrisNZ
Tourmaline | Level 20

1- Agreed, value MIN seems of no practical value

 

2- Is it silently reduced to the number of MEMSIZE?
  Yes it is as shown in the diagram. Apologies I read your question backwards. Smiley Embarassed

  Note that REALMEMSIZE is the value that matters.

 

ChrisNZ
Tourmaline | Level 20

About the MIN value, what would you ever use this?

 

If you are simply interested in proc sort's internals, you may be interested in the explanations in the same chapter. I won't post it here as it's too long, but you can see the different phases yourself by running the code below.

 

Excerpt:

As an exercise, we sort a table and gradually reduce the amount of memory available to PROC SORT in order to degrade its performance. As PROC SORT goes through different algorithms to accommodate its execution environment, it gives information about its execution choices.

Here is the code:

option fullstimer msglevel=I cpucount=2 compress=no;
data TMP; do I=1 to 1e6; output; end; run;

*Test1; proc sort data=TMP out=TMP1 sortsize=90000k details; by I; run;
*Test2; proc sort data=TMP out=TMP1 sortsize= 9000k details; by I; run;
*Test3; proc sort data=TMP out=TMP1 sortsize= 2300k details; by I; run;
*Test4; proc sort data=TMP out=TMP1 sortsize=  900k details; by I; run;
*Test5; proc sort data=TMP out=TMP1 sortsize=   90k details; by I; run;
*Test6; proc sort data=TMP(obs=99)  sortsize=90000k details; by I; run;
 

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3396 views
  • 2 likes
  • 2 in conversation