BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

what's the limit to how many elements(variables) a SAS array can hold? 

1. if it is subject to memory, what is that? 

2. Why temporary arrays are fastest ?i.e what makes it incredibly fast?

3. Does processing efficiency differ between implicit arrays and explicit array ? although SAS docs don't encourage implicit, but why?

4. Does processing efficiency  differ between retained and non retained arrays? if yes, why?

5. leading on from 4th, should I assume processing efficiency takes a hit when having a combination of retained, non retained compile time and execution time var array?

 

My initial guesses/thought: physical address being contiguous perhaps in easily grouped arrays perhaps?? Well, either way my glass is not even close to half full 😞

 

Any clear( i mean really clear) docs plz coz it's only fair that I don't bother your valuable time, so some directions to docs for me to self research will suffice. 

More than the what, i am keen to understand the why

 

Best regards

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

My two cents:

 

Temporary arrays are faster because they are actual arrays, that is contiguous blocks of memory. Variable arrays may also be in contiguous memory, but as they do not have to be (and character variable arrays can have elements of different lengths), SAS creates an adress table to look the elements up. So, if you refer to a variable array, SAS first uses the offset to find an entry in the address table, and then uses that to find the variable. If you use a _TEMPORARY_ array, the elements are found simply by calculating offset from base address.

 

Implicit array references (do over) are not encouraged, because SAS Institute intends to do away with them (it has been an undocumented feature for years, now). So if you use them, you risk that your program will have to be rewritten when the next version of SAS comes out. Performance-wise, DO OVER is just the same as explicit array processing (a temporary variable named _I_ is created in the background and used to loop through the array).

View solution in original post

12 REPLIES 12
data_null__
Jade | Level 19

Here is a simple macro to investigate question 1.

 

options fullstimer=1;
%macro main(arg);
   %local i x;
   %do i = 1 %to &arg;
      %let x = %sysevalf(1e&i,integer);
      %put NOTE: &=i &=x;
      data _null_;
         array a[&x];
         run;
      %end;
   %mend;
%main(5);

 

ballardw
Super User

Question does seem to be memory limited:

293  data work.junk;
294     array z{10000000};
295  run;

ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: DATA statement used (Total process time):
      real time           1:51.52
      cpu time            1:51.03

but whether that is my actual system or program use limits who knows.

 

No problem creating 1 million variables, as above for 10 million. So somewhere in between on my system at least.

 

If by "implicit array" you mean accessing the elements using DO OVER I would be fairly certain that implicit arrays are somewhat slower because of finding limits and the manipulation in the background of whatever substitutes for an explicit index. Why not use implicit arrays? Try doing a 2 or higher order dimension array. Or using aligned values in two or more separate arrays.

 

By "retained array" do you mean all elements of an array appear on a RETAIN statement? Or an implicit retain such as a[I] +1? Or something else?

 

And as a general comment on "processing efficiency": there can be a number of ways to measure "efficiency" a few: Is it execution time, memory use, programming time, program maintenance time or some combination of all of them. Is it actually efficient to spend 10 hours finding a process that shaves 0.0001 seconds from run time if the program is only every going to run 10 times? or 10,000,000 times?

Kurt_Bremser
Super User

I ran into the insufficient memory much earlier (400000 array elements). Since the pure data size of this array (32MB) is way below the MEMSIZE I have (256 MB), it has to be the allocation of names (32 bytes each) which causes the memory overflow. I could verify this by creating a temporary array, which happens noticeably faster, and overflows somewhere between 10 and 100 million elements.

Temporary arrays have no entries in the variable name table.

mkeintz
PROC Star

As other respondents to your questions have shown, every one of your questions (except 3, which is a "why") is answerable by relatively straightforward coding experiments - which I suspect can be an addictive part of programming.  After all, in programming, it's usually easy to answer questions of resource limits.

 

As  to why temporary arrays are faster.  Consider fundamental attributes of temporary arrays

  1. the array value are automatically retained across iterations of the data step.  No need to reset to missing, or to replace when a  SET, MERGE, or UPDATE statement is encountered.
  2. the values are not output to the resulting data set, so no need for putting into the output buffer.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

@mkeintz wrote:

As other respondents to your questions have shown, every one of your questions (except 3, which is a "why") is answerable by relatively straightforward coding experiments - which I suspect can be an addictive part of programming.  After all, in programming, it's usually easy to answer questions of resource limits.

 

As  to why temporary arrays are faster.  Consider fundamental attributes of temporary arrays

  1. the array value are automatically retained across iterations of the data step.  No need to reset to missing, or to replace when a  SET, MERGE, or UPDATE statement is encountered.
  2. the values are not output to the resulting data set, so no need for putting into the output buffer.

Better phrased than what came to me when I was flashing-back to Assembler coding of arrays ...

Kurt_Bremser
Super User

@mkeintz wrote:

As other respondents to your questions have shown, every one of your questions (except 3, which is a "why") is answerable by relatively straightforward coding experiments - which I suspect can be an addictive part of programming.  After all, in programming, it's usually easy to answer questions of resource limits.

 

As  to why temporary arrays are faster.  Consider fundamental attributes of temporary arrays

  1. the array value are automatically retained across iterations of the data step.  No need to reset to missing, or to replace when a  SET, MERGE, or UPDATE statement is encountered.
  2. the values are not output to the resulting data set, so no need for putting into the output buffer.

Add

3. no names for individual array elements need to be created (32 bytes each).

 

This sheds new light on the discussion about increasing the possible size of SAS variable names. Imagine the SAS system allows 256 characters, then a 100000 elements array would need less than 1 MB for the numeric data, but > 200 MB just for the names!

s_lassen
Meteorite | Level 14

My two cents:

 

Temporary arrays are faster because they are actual arrays, that is contiguous blocks of memory. Variable arrays may also be in contiguous memory, but as they do not have to be (and character variable arrays can have elements of different lengths), SAS creates an adress table to look the elements up. So, if you refer to a variable array, SAS first uses the offset to find an entry in the address table, and then uses that to find the variable. If you use a _TEMPORARY_ array, the elements are found simply by calculating offset from base address.

 

Implicit array references (do over) are not encouraged, because SAS Institute intends to do away with them (it has been an undocumented feature for years, now). So if you use them, you risk that your program will have to be rewritten when the next version of SAS comes out. Performance-wise, DO OVER is just the same as explicit array processing (a temporary variable named _I_ is created in the background and used to loop through the array).

novinosrin
Tourmaline | Level 20

Thank you @s_lassen  While all responses are indeed great, the mechanics of  look up in contiguous/non contiguous blocks of memory that contains addresses is something I have been desperately seeking to understand. I think you seem to have nailed what I am after. If i could by perhaps with playing around with addr/long and peek/long may eventually make me feel good.  Very interesting and sound directive. Tak!!!!

data_null__
Jade | Level 19

Syntax note.  Implicit array syntax has an optional index-variable parameter to define an index variable other than _I_.

 

47   data _null_;
48      array s(j) s1-s4;
49      do over s;
50         s = j;
51         put _all_;
52         end;
53      run;

j=1 s1=1 s2=. s3=. s4=. _ERROR_=0 _N_=1
j=2 s1=1 s2=2 s3=. s4=. _ERROR_=0 _N_=1
j=3 s1=1 s2=2 s3=3 s4=. _ERROR_=0 _N_=1
j=4 s1=1 s2=2 s3=3 s4=4 _ERROR_=0 _N_=1
novinosrin
Tourmaline | Level 20

Guru @data_null__  Thank you. Very nice indeed. 

mkeintz
PROC Star

In the case of non-temporary arrays, SAS appears to move the variables to make a array elements contiguous.  For example, in the program below, I assigned the array N to eight different orderings of the variables age height weight from sashelp.class.  I took 24 bytes (peekclong) starting from the address of N{1}.  And then assigned the first 8 bytes to variable new1, the second 8 bytes to new2, and bytes 17-24 to new3.

 

In each case new1, new2, new3 had identical values to the original variables assigned to the array - NO MATTER WHAT ORDER I USED.  So in this case SAS moved variables to maintain an order compatible with the array declaration.   BTW,  the storage location of the array is quite different than the storage locations used when the variables are not assigned to an array - the variables are MOVED, not COPIED.  So  variables assigned to an array are re-arranged to improve performance in array usage.  (I didn't examine the case of character variables).

 

Of course, if I used those variables in a different order in a second array, that additional array would not have the benefit of contiguously stored elements.

 

data _null_;

  set sashelp.class (obs=1 );
  put (_numeric_) (=) //;

  *array n {*} height weight age;
  *array n {*} weight age height;
  *array n {*} age height weight;
  *array n {*} age weight height ;
  array n {*} weight height age ;
  *array n {*} height age weight;

  bytes24=peekclong(addrlong(n{1}),24.);
  new1=input(substr(bytes24,01,8),rb8.);
  new2=input(substr(bytes24,09,8),rb8.);
  new3=input(substr(bytes24,17,8),rb8.);
  put n{1}= @15 new1= /
      n{2}= @15 new2= /
      n{3}= @15 new3=;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Thank you & Nice work. I think the caveat here is  when we include an element more than once in the same non temporary retained array coming from source data set

  array n {*} weight height age weight;

The address of This is confusing to me as I am not sound with my understanding yet of how the two weight variables physical addresses are generated

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 5143 views
  • 5 likes
  • 6 in conversation