On further thought, the 3 Key Variables are coming from the Iterator Objects and hence, there is no need to place them in the data-part of the Hash Table(H). The following statement, h.definedata('HPG','Plant','Material','X'); can be replaced with: h.definedata('X'); For your benefit, I have just commented the previous statement in the code I posted yesterday. Xia Keshan believes that his SQL solution can be faster than Hash solution. I do not find evidence to his belief and instead, the Hash solution runs AT LEAST 3 times faster than SQL besides Hash taking lesser memory. However, I concur with him that some expertise to use Hash is a must. Here is the proof. Let us create a simulated data set, HAVE and run both CODES. options fullstimer; data have; do j = 1 to 200; i1 = 64 + ceil(ranuni(123) * 26); i2 = 64 + ceil(ranuni(123) * 26); i3 = 64 + ceil(ranuni(123) * 26); HPG = catt(byte(i1), byte(i2), byte(i3)); Plant = put(ceil(ranuni(123) * 1000), z4.); Material = put(ceil(ranuni(123) * 999), z3.); X = ceil(ranuni(123) * 100); output; end; keep HPG Plant Material X; run; 371 data want; 372 if _n_ = 1 then do; 373 if 0 then set have; 374 declare hash h(hashexp:20); 375 h.definekey('HPG','Plant','Material'); 376 *h.definedata('HPG','Plant','Material','X'); 377 h.definedata('X'); 378 h.definedone(); 379 380 381 declare hash hH(ordered:'a'); 382 hH.definekey('HPG'); 383 hH.definedone(); 384 declare hash hP(ordered:'a'); 385 hP.definekey('Plant'); 386 hP.definedone(); 387 declare hash hM(ordered:'a'); 388 hM.definekey('Material'); 389 hM.definedone(); 390 391 392 do until(eof); 393 set have end = eof; 394 if hH.find() ^= 0 then hH.add(); 395 if hP.find() ^= 0 then hP.add(); 396 if hM.find() ^= 0 then hM.add(); 397 h.add(); 398 end; 399 end; 400 declare hiter hiH('hH'); 401 declare hiter hiP('hP'); 402 declare hiter hiM('hM'); 403 404 405 do while(hiH.next() = 0); 406 do while(hiP.next() = 0); 407 do while(hiM.next() = 0); 408 if h.find() ^= 0 then X = 0;; 409 output; 410 end; 411 end; 412 end; 413 stop; 414 run; NOTE: There were 200 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 6812784 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 14.63 seconds user cpu time 6.05 seconds system cpu time 2.18 seconds Memory 9019k OS Memory 17780k Timestamp 2/5/2015 2:46:34 PM 415 proc sql; 416 create table want as 417 select a.*,coalesce(b.x,0) as x 418 from 419 (select * from (select distinct HPG from have),(select distinct Plant from have),(select 419! distinct Material from have)) as a 420 left join 421 have as b 422 on a.HPG=b.HPG and a.Plant=b.Plant and a.Material=b.Material ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.WANT created, with 6812784 rows and 4 columns. 423 quit; NOTE: PROCEDURE SQL used (Total process time): real time 43.07 seconds user cpu time 16.73 seconds system cpu time 10.42 seconds Memory 69088k OS Memory 78196k Timestamp 2/5/2015 2:47:36 PM
... View more