BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AhmedAl_Attar
Rhodochrosite | Level 12

@yabwon @Patrick 

Guys,

Thank you for inspiring solutions, and just as an FYI, on my machine, Single level/dimension array approach was the fastest when compared to 2-Dimensional & 3-Dimensional array approaches. I wonder if you had similar findings.

 

Tom
Super User Tom
Super User

@AhmedAl_Attar wrote:

@yabwon @Patrick 

Guys,

Thank you for inspiring solutions, and just as an FYI, on my machine, Single level/dimension array approach was the fastest when compared to 2-Dimensional & 3-Dimensional array approaches. I wonder if you had similar findings.

 


Could just be a result of re-running and getting benefits of caching.  Run the two in opposite order and see if the the difference stays the same.

 

If so then I assume it because of the reduced number of checks that SAS has to do to make sure the indexes are in bounds. Doing the simple multiplication and then just checking that the result is in bounds is probably faster than checking three index values. Especially since it probably has to do the multiplication anyway to index to the requested location.

Patrick
Opal | Level 21

@yabwon And for my own upskilling and because I could I've now also been running this aggregation step in Viya 4 CAS on an environment with 192 threads (4 CAS workers with 48 CPU's each).

This would in reality only be worth it if the simulation process already runs in CAS and creates a CAS table.

 

The hash approach took 14 seconds. I've been told that if there is no hash method used that needs write access to the hash then CAS is smart enough to load the hash only once (not sure if once per worker or once at all on the controller node).

 

The temp array approach executed in CAS without errors or warning BUT it somehow skipped processing all the data. Not sure yet if this is a bug or if I'm just not understanding yet how things work.

The moment I removed the do until step that loads the array the set statement read all the rows from casuser.table_a.

 

casuser.table_A is partitioned by I - which is why I don't need a 2nd single threaded pass through the data to consolidate the results from the different nodes ("MapReduce").

casuser.table_B is replicated on all nodes.

 

And here the log.

 

Spoiler
179  /************ hash lookup ************/
180  data casuser.want_HASH(keep = i sum_x replace=yes);
181    if _n_ = 1 then
182      do;
183        if 0 then set casuser.table_A(keep=i id id_2 category) casuser.table_B(keep=x);
184        declare hash hh_bew (dataset: 'casuser.table_B');
185        hh_bew.definekey  ('id', 'id_2', 'category');
186        hh_bew.definedata ('x');
187        hh_bew.definedone ();
188      end;
189  
190    set casuser.table_A(keep=i id id_2 category);
191    by i;
192    if hh_bew.find()=0 then sum_x + x;
193    if last.i then
194      do;
195        output;
196        call missing(sum_x);
197      end;
198  run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 0 observations read from the table TABLE_A in caslib CASUSER(*****).
NOTE: There were 0 observations read from the table TABLE_B in caslib CASUSER(*****).
NOTE: There were 308519088 observations read from the table TABLE_A in caslib CASUSER(*****).
NOTE: There were 4890072 observations read from the table table_B in caslib CASUSER(*****).
NOTE: The table want_HASH in caslib CASUSER(*****) has 5552 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           13.56 seconds
      user cpu time       0.06 seconds
      system cpu time     0.02 seconds
      memory              1675.28k
      OS Memory           40812.00k
      Timestamp           03/08/2024 12:59:55 AM
      Step Count                        18  Switch Count  2
      Page Faults                       0
      Page Reclaims                     140
      Page Swaps                        0
      Voluntary Context Switches        275
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      
199  
200  /************ temporary array direct key lookup ************/
201  %let max_i = 5600;
202  %let max_id = 55000;
203  %let max_category = 22;
204  %let max_ID_2=1;
205  
206  data casuser.want_ARRAY(keep = I SUM_X replace=yes) ;
207  
208    /* populate array */
209    array XX[&max_id.       /* 1 to 55000 */
210            ,&max_category. /* 1 to 22 */
211            ,0:1            /* for ID_2  values 0 to 1 */
212            ] _temporary_;
213    if _n_=1 then
214      do until(eofB);
215        set casuser.table_B end=EOFB;
216        XX[ID,CATEGORY,input(ID_2,1.)]=X;
217      end;
218  
219    set casuser.table_A;
220    by I;
221  
222    if first.I then SUM_X = 0;
223    SUM_X + XX[ID,CATEGORY,input(ID_2,1.)];
224  
225    if last.I then output;
226  
227  run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 1222518 observations read from the table TABLE_B in caslib CASUSER(*****).
NOTE: There were 76796502 observations read from the table TABLE_A in caslib CASUSER(*****).
NOTE: The table want_ARRAY in caslib CASUSER(*****) has 1382 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           12.74 seconds
      user cpu time       0.03 seconds
      system cpu time     0.03 seconds
      memory              20470.84k
      OS Memory           59720.00k
      Timestamp           03/08/2024 01:00:07 AM
      Step Count                        19  Switch Count  2
      Page Faults                       0
      Page Reclaims                     703
      Page Swaps                        0
      Voluntary Context Switches        238
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      

 

 

 

 

yabwon
Onyx | Level 15

@Patrick , so for hash table approach adding some "brute computing force" gives a nice boost.

 

There is one more interesting observation (regardless the fact the array approach has some data reading issues, which is totally bizarre to me).

When you take a look at my or @Bernd_S  log from SAS 9.4, you will notice there is almost no difference between CPU time and Real time, e.g. @Bernd_S  log says:

for array:

real time 1:04.32
user cpu time 55.27 seconds

 

for hash:

real time 4:46.23
user cpu time 4:41.24

In case of Viya there is almost none CPU time reported, I wonder what is the reason? 

 

Another experiment I did was running both codes (hash and array) on my "home made Viya" which is "SAS9 + work in RAM disk", result was that there was no difference in processing time between data in RAM drive and data in SSD drive...

 

Bart

 

P.S. RamDrive tool i'm using is imdisk-toolkit (https://sourceforge.net/projects/imdisk-toolkit/)

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

@Patrick 

I did one more test with my "home made parallel processing using systask" code  (basis was materials from my old parallel-systask training: https://pages.mini.pw.edu.pl/~jablonskib/SASpublic/Parallel-processing-in-BASE-SAS.sas)

 

I managed:

- with 5 parallel sas sessions I managed to reduce hash-table processing to 26.54 sec.

- with 4 parallel sas sessions I managed to reduce hash-table processing to 31.995 sec.

- with 5 parallel sas sessions I managed to reduce array processing to 4.831 sec.

- with 4 parallel sas sessions I managed to reduce array processing to 5.551 sec.

- with 3 parallel sas sessions I managed to reduce array processing to 7.005 sec.

 

I think SSD drives did the most "speed boost" but I dare to say "not bad for a laptop" 😉

 

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hashman
Ammonite | Level 13

Cześć Bartku,

 

Thanks for the plug. 

Key-indexed search / aggregation always wins speed-wise if there is enough memory for it. 

If not, an array-based hash table is the next fastest option coupled with a drastic memory reduction. I'd just use the simplest collision resolution policy, the linear probing, and set the hash table size at approximately twice the number of distinct keys. Code can be adapted almost verbatim from our (with Richard DeVenezia) last year's SESUG paper "Hashes From the Ashes", Section 2:

https://www.lexjansen.com/sesug/2023/SESUG2023_Paper_203_Final_PDF.pdf  

The hash object is a great tool (it would be funny if I thought otherwise) with gobs of functionality compared to array-based hashing but the latter can be faster/lighter at specialized tasks. You know, at times one needs a dedicated blade instead of those included in a Swiss Army knife.

 

Gorące pozdrowienia

Paul D.

Bernd_S
Fluorite | Level 6

Hi,

 

@yabwon Thank you very much, that is exactly what I need. I also didn't know and didn't expect that the use of an array would be so much faster. I can make ID_2 numeric.

 

Your solution on my machine:  

real time 1:04.32
user cpu time 55.27 seconds
system cpu time 9.06 seconds
memory 96561.62k
OS Memory 132524.00k

 

@Patricks Hash solution:

real time 4:46.23
user cpu time 4:41.24
system cpu time 5.00 seconds
memory 132394.82k
OS Memory 168216.00k

 

@Patrick Thanks again for your suggestions and analysis.

mkeintz
PROC Star

@Bernd_S wrote:

Hi,

 

@yabwon Thank you very much, that is exactly what I need. I also didn't know and didn't expect that the use of an array would be so much faster. I can make ID_2 numeric.

 

Arrays are ALWAYS faster than hash lookups, because arrays use direct key indexing.  The array index effectively points directly at the location of the content.   But a hash KEY needs to go through a transformation to find its location in the hash object (but the hash transformation will be faster than the analogous binary search for a key in a sorted list).  

 

But arrays can't do this:

  1. You can index (i.e. definekey) on character values..
  2. Dynamically grow the number of dataitems in the object.
  3. Retrieve a dataitem of multiple variables with a single hash method call, whereas the array analog would require programming retrieval of the i'th element from each of a set of one-dimension arrays (or multiple columns from a given array row).  The same relative advantage exists for adding to the hash vs adding to arrays.
  4. Dynamically create the content of a hash object.
  5. Dynamically free up memory used by a hash object.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1305 views
  • 25 likes
  • 9 in conversation