Hi Patrick, Thanks for your suggestion. I have no idea on how to code it as I am not familiar with hash object. My objective is to match all the firms (tem1.tic) with ONE matched firm (tic) at my best effort. The lg_asset is the firms size. I tried to matched firms with the smallest difference in firm size. The following is my code prior to the sample data I posted previously: data BENCH1(keep=fyear tic lg_asset sic1) BENCH0 (keep=fyear tic lg_asset sic1); set geodata.exfinance_match; if BENCH= 1 then output BENCH1; if BENCH= 0 then output BENCH0; run; proc sql; create table geodata.temp as select a.tic as TEM1_tic, a.fyear as tem1_fyear, b.*, abs(sum(a.LG_ASSET, -b.LG_ASSET)) as diffPS, min(calculated diffPS) as closest_size from BENCH1(where=(LG_ASSET is not missing)) a , BENCH0(where=(LG_ASSET is not missing)) b group by a.tic, a.fyear having diffPS = closest_size and a.sic1=b.sic1; quit; The data that I posted, after running the above programs; the firms, tem1.tic have been matched with tic based on the condition: diffPS = closest_size and a.sic1=b.sic1 grouped by a.tic, a.fyear. This is stage that I wish to ensure every tem1.tic firm has a matched tic (for a given year ). I have to ensure that there is no duplication, that is, a particular tic cannot be matched with more than one tem1.tic. Firms which have no (little) alternative(s) matched firm should be given a priority to be matched earlier. I hope that my explanation is clear now. Thank you. Regards, MSPAK
... View more