Hello, I want to randomly match observations from two datasets. Data1 contains treatment firms and data2 contain control firms. Data1 (treatment) gvkey Fyear sic3 AT Q Y X1 X2 1012 1987 335 16.858 2 0.837307 0.3018 0.246012 1037 1983 366 24.403 2 0.780522 0.411516 0.309363 1050 2007 356 96.535 1 0.762024 0.192556 0.893492 1050 2008 356 120.017 1 0.131183 0.775842 0.306371 1056 2000 382 248.707 3 0.780103 0.886534 0.583451 1056 2001 382 310.252 3 0.506279 0.471617 0.82575 1056 2002 382 318.465 3 0.513091 0.701931 0.92213 Data2 (eligible control) GVKEY FYEAR sic3 AT Q Y X1 X2 1000 1973 308 21.771 2 0.150945 0.34661 0.820642 1000 1977 308 44.025 3 0.195324 0.823802 0.258745 1000 1971 308 29.33 2 0.402987 0.579026 0.036652 1000 1976 308 38.586 3 0.175518 0.346498 0.691525 1000 1974 308 25.638 2 0.282994 0.695814 0.549823 1000 1975 308 23.905 2 0.184889 0.634319 0.003082 1000 1970 308 33.45 3 0.227734 0.688334 0.985432 1000 1972 308 19.907 2 0.690881 0.34722 0.360407 Control and treatment datasets have the same variables. I want to match each treatment with one control with the same SIC3 and the same Q (asset quintile) (control and treatment do not need to be in year (fyear)). The matched sample should be like this: Want dataset (matched sample): sic3 Q fyear gvkey at Y X1 X2 gvkey_c at_c Y_C X1_C X2_C 100 1 1980 7920 38.362 0.187701 0.301751 0.979782 8942 47.288 0.268632 0.314711 0.811104 104 2 1980 5560 53.779 0.25313 0.501512 0.317621 12822 26.643 0.710939 0.926753 0.706577 104 3 1980 5686 367.111 0.025062 0.466618 0.102241 1856 100.133 0.971438 0.77248 0.397624 104 3 1980 3153 14.781 0.103293 0.672411 0.366022 2127 9.006 0.735427 0.978271 0.624185 131 4 1980 2164 1.894 0.148038 0.320704 0.628284 8238 1.896 0.643021 0.880709 0.607355 131 1 1980 5129 4.368 0.78328 0.96015 0.670081 5498 4.474 0.022384 0.868951 0.877956 131 2 1980 2073 18.815 0.560716 0.514982 0.979269 2873 18.472 0.201125 0.922818 0.736144 131 3 1980 6870 142.959 0.784038 0.892376 0.323672 3420 140.806 0.404023 0.236646 0.159027 131 3 1980 2437 209.629 0.618297 0.118454 0.854774 1364 204.723 0.82115 0.163351 0.524918 131 4 1980 1544 210.22 0.577997 0.668211 0.218588 7560 203.871 0.29084 0.638787 0.077885 As you can see in the matched sample, the last 5 columns belong to controls and other belong to treatment (sic3 and Q are the same for treatment and control). I need to repeat this process 1,000 times to have 1,000 matched samples. Then, I run OLS regression Y = X1 + X2 for 1,000 matched samples to obtain the mean and standard deviation, p-value of coefficient X1, X2. The SAS code for this regression is something like: proc surveyreg data=want ; cluster gvkey; model Y = X1 X2 X1_C X2_C; ods output parameterestimates = param fitstatistics = statistics datasummary = datasummary; run; quit; (I need output dataset to obtain some statistics I need). I hope that anyone can give me some suggestions on how to perform this simulation. I can match one treatment with one control in the same sic3 and closest AT (assets), but not random match. Best regards, Thierry
... View more