Patrick's direction is definitely applaudable. I have tried SQL first, and stop it after 30-mins of running. Here is my hash approach, I think we may not need to use hiter, as both table are identical. After making sure that all of the zipcodes are included in the sashelp.zipcodes, 407 data want (drop=_:); 408 if _n_=1 then do; 409 set comp( rename=(tic=_tic zip=_zip) obs=1); 410 dcl hash h(dataset: 'comp (rename=(tic=_tic zip=_zip))', multidata: 'yes'); 411 h.definekey('fyear','sic3'); 412 h.definedata('_tic','_zip'); 413 h.definedone(); 414 end; 415 416 set comp; 417 _ct=0; 418 _rc=h.find(); 419 do while (_rc=0); 420 if .<zipcitydistance(_zip,zip) <=60 then _ct+1; 421 h.has_next(result: _r); 422 if _r ne 0 then _rc=h.find_next(); 423 else leave; 424 end; 425 DENSITY_INDUSTRY=log(_ct); 426 427 run; NOTE: There were 66452 observations read from the data set WORK.COMP. NOTE: There were 1 observations read from the data set WORK.COMP. NOTE: There were 66452 observations read from the data set WORK.COMP. NOTE: The data set WORK.WANT has 66452 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 5:26.81 cpu time 5:17.99 Less than 6mins, Not too shady. BTW, the has_next() method is not a must-have here. You can just use: data want (drop=_:); if _n_=1 then do; set comp( rename=(tic=_tic zip=_zip) obs=1); dcl hash h(dataset: 'comp (rename=(tic=_tic zip=_zip))', multidata: 'yes'); h.definekey('fyear','sic3'); h.definedata('_tic','_zip'); h.definedone(); dcl hiter hi('h'); end; set comp ; _ct=0; _rc=h.find(); do while (_rc=0); if .<zipcitydistance(_zip,zip) <=60 then _ct+1; _rc=h.find_next(); end; DENSITY_INDUSTRY=log(_ct); run; I was hoping has_next will help the efficiency, well, it turned out no difference. Regards, Haikuo
... View more