I found a solution to my problem. Since I wanted to match each firm (gvkey) with its 10 nearest industry-year peers, I first created the gindfyear variable which is a combination of industry (gind) and year (fyear). This allowed me to create the pairwise combinations based upon matched industry-year. Since I began with 72,693 observations, I did not want to create 5,284,199,556 firm-firm pairwise combinations!! This first batch of code did this: proc sql;
create table want
as select distinct a.gvkey, a.zip_code as main_zip, b.gvkey as peer, b.zip_code as peer_zip, a.gindfyear
from have a, have b
where a.gindfyear eq b.gindfyear and a.gvkey ne b.gvkey;
quit; As you can see above, it creates 5 variables: gvkey, main zip, peer, peer zip, and industry-year. This resulted in 9,861,622 obs, which is far fewer than the potential 5 billion I was reticent to create. I then use the zipcitydistance function in SAS to determine the distances between each main and peer. Finally, I ranked them, keeping only the four closest. The final code is below: data want2;
set want;
dist= zipcitydistance(main_zip, peer_zip);
run;
proc sort
data= want2;
by gvkey gindfyear dist;
quit;
data want3;
set want2;
by gvkey gindfyear dist;
retain closest;
if first.gindfyear then do;
closest=1;
output;
end;
else if closest lt 4 then do;
closest = closest + 1;
output;
end;
run; This correctly yields 726,930 observations since I began with 72,693 observations (10x for each).
... View more