@AhmedAl_Attar wrote:
Guys,
Thank you for inspiring solutions, and just as an FYI, on my machine, Single level/dimension array approach was the fastest when compared to 2-Dimensional & 3-Dimensional array approaches. I wonder if you had similar findings.
Could just be a result of re-running and getting benefits of caching. Run the two in opposite order and see if the the difference stays the same.
If so then I assume it because of the reduced number of checks that SAS has to do to make sure the indexes are in bounds. Doing the simple multiplication and then just checking that the result is in bounds is probably faster than checking three index values. Especially since it probably has to do the multiplication anyway to index to the requested location.
@yabwon And for my own upskilling and because I could I've now also been running this aggregation step in Viya 4 CAS on an environment with 192 threads (4 CAS workers with 48 CPU's each).
This would in reality only be worth it if the simulation process already runs in CAS and creates a CAS table.
The hash approach took 14 seconds. I've been told that if there is no hash method used that needs write access to the hash then CAS is smart enough to load the hash only once (not sure if once per worker or once at all on the controller node).
The temp array approach executed in CAS without errors or warning BUT it somehow skipped processing all the data. Not sure yet if this is a bug or if I'm just not understanding yet how things work.
The moment I removed the do until step that loads the array the set statement read all the rows from casuser.table_a.
casuser.table_A is partitioned by I - which is why I don't need a 2nd single threaded pass through the data to consolidate the results from the different nodes ("MapReduce").
casuser.table_B is replicated on all nodes.
And here the log.
179 /************ hash lookup ************/ 180 data casuser.want_HASH(keep = i sum_x replace=yes); 181 if _n_ = 1 then 182 do; 183 if 0 then set casuser.table_A(keep=i id id_2 category) casuser.table_B(keep=x); 184 declare hash hh_bew (dataset: 'casuser.table_B'); 185 hh_bew.definekey ('id', 'id_2', 'category'); 186 hh_bew.definedata ('x'); 187 hh_bew.definedone (); 188 end; 189 190 set casuser.table_A(keep=i id id_2 category); 191 by i; 192 if hh_bew.find()=0 then sum_x + x; 193 if last.i then 194 do; 195 output; 196 call missing(sum_x); 197 end; 198 run; NOTE: Running DATA step in Cloud Analytic Services. NOTE: The DATA step will run in multiple threads. NOTE: There were 0 observations read from the table TABLE_A in caslib CASUSER(*****). NOTE: There were 0 observations read from the table TABLE_B in caslib CASUSER(*****). NOTE: There were 308519088 observations read from the table TABLE_A in caslib CASUSER(*****). NOTE: There were 4890072 observations read from the table table_B in caslib CASUSER(*****). NOTE: The table want_HASH in caslib CASUSER(*****) has 5552 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 13.56 seconds user cpu time 0.06 seconds system cpu time 0.02 seconds memory 1675.28k OS Memory 40812.00k Timestamp 03/08/2024 12:59:55 AM Step Count 18 Switch Count 2 Page Faults 0 Page Reclaims 140 Page Swaps 0 Voluntary Context Switches 275 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 199 200 /************ temporary array direct key lookup ************/ 201 %let max_i = 5600; 202 %let max_id = 55000; 203 %let max_category = 22; 204 %let max_ID_2=1; 205 206 data casuser.want_ARRAY(keep = I SUM_X replace=yes) ; 207 208 /* populate array */ 209 array XX[&max_id. /* 1 to 55000 */ 210 ,&max_category. /* 1 to 22 */ 211 ,0:1 /* for ID_2 values 0 to 1 */ 212 ] _temporary_; 213 if _n_=1 then 214 do until(eofB); 215 set casuser.table_B end=EOFB; 216 XX[ID,CATEGORY,input(ID_2,1.)]=X; 217 end; 218 219 set casuser.table_A; 220 by I; 221 222 if first.I then SUM_X = 0; 223 SUM_X + XX[ID,CATEGORY,input(ID_2,1.)]; 224 225 if last.I then output; 226 227 run; NOTE: Running DATA step in Cloud Analytic Services. NOTE: The DATA step will run in multiple threads. NOTE: There were 1222518 observations read from the table TABLE_B in caslib CASUSER(*****). NOTE: There were 76796502 observations read from the table TABLE_A in caslib CASUSER(*****). NOTE: The table want_ARRAY in caslib CASUSER(*****) has 1382 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 12.74 seconds user cpu time 0.03 seconds system cpu time 0.03 seconds memory 20470.84k OS Memory 59720.00k Timestamp 03/08/2024 01:00:07 AM Step Count 19 Switch Count 2 Page Faults 0 Page Reclaims 703 Page Swaps 0 Voluntary Context Switches 238 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0
@Patrick , so for hash table approach adding some "brute computing force" gives a nice boost.
There is one more interesting observation (regardless the fact the array approach has some data reading issues, which is totally bizarre to me).
When you take a look at my or @Bernd_S log from SAS 9.4, you will notice there is almost no difference between CPU time and Real time, e.g. @Bernd_S log says:
for array:
real time 1:04.32
user cpu time 55.27 seconds
for hash:
real time 4:46.23
user cpu time 4:41.24
In case of Viya there is almost none CPU time reported, I wonder what is the reason?
Another experiment I did was running both codes (hash and array) on my "home made Viya" which is "SAS9 + work in RAM disk", result was that there was no difference in processing time between data in RAM drive and data in SSD drive...
Bart
P.S. RamDrive tool i'm using is imdisk-toolkit (https://sourceforge.net/projects/imdisk-toolkit/)
I did one more test with my "home made parallel processing using systask" code (basis was materials from my old parallel-systask training: https://pages.mini.pw.edu.pl/~jablonskib/SASpublic/Parallel-processing-in-BASE-SAS.sas)
I managed:
- with 5 parallel sas sessions I managed to reduce hash-table processing to 26.54 sec.
- with 4 parallel sas sessions I managed to reduce hash-table processing to 31.995 sec.
- with 5 parallel sas sessions I managed to reduce array processing to 4.831 sec.
- with 4 parallel sas sessions I managed to reduce array processing to 5.551 sec.
- with 3 parallel sas sessions I managed to reduce array processing to 7.005 sec.
I think SSD drives did the most "speed boost" but I dare to say "not bad for a laptop" 😉
Bart
Cześć Bartku,
Thanks for the plug.
Key-indexed search / aggregation always wins speed-wise if there is enough memory for it.
If not, an array-based hash table is the next fastest option coupled with a drastic memory reduction. I'd just use the simplest collision resolution policy, the linear probing, and set the hash table size at approximately twice the number of distinct keys. Code can be adapted almost verbatim from our (with Richard DeVenezia) last year's SESUG paper "Hashes From the Ashes", Section 2:
https://www.lexjansen.com/sesug/2023/SESUG2023_Paper_203_Final_PDF.pdf
The hash object is a great tool (it would be funny if I thought otherwise) with gobs of functionality compared to array-based hashing but the latter can be faster/lighter at specialized tasks. You know, at times one needs a dedicated blade instead of those included in a Swiss Army knife.
Gorące pozdrowienia
Paul D.
Hi,
@yabwon Thank you very much, that is exactly what I need. I also didn't know and didn't expect that the use of an array would be so much faster. I can make ID_2 numeric.
Your solution on my machine:
real time 1:04.32
user cpu time 55.27 seconds
system cpu time 9.06 seconds
memory 96561.62k
OS Memory 132524.00k
@Patricks Hash solution:
real time 4:46.23
user cpu time 4:41.24
system cpu time 5.00 seconds
memory 132394.82k
OS Memory 168216.00k
@Patrick Thanks again for your suggestions and analysis.
@Bernd_S wrote:
Hi,
@yabwon Thank you very much, that is exactly what I need. I also didn't know and didn't expect that the use of an array would be so much faster. I can make ID_2 numeric.
Arrays are ALWAYS faster than hash lookups, because arrays use direct key indexing. The array index effectively points directly at the location of the content. But a hash KEY needs to go through a transformation to find its location in the hash object (but the hash transformation will be faster than the analogous binary search for a key in a sorted list).
But arrays can't do this:
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: