Hello All,
I am a newbie and need your help with the coding. I have a huge sample (obs around 10,000) which is divided into two subsamples-- sample firms and control firms. I am trying to match each sample firm with a control firm using the following criteria:
(1) sample firm and control firm should belong to the same year and same industry.
(2) the control firm should be the closest match with the sample firm based on SIZE and BM. That is, [(Sample firm SIZE – Control Firm Size) + (Sample firm BM – Control Firm BM)] = minimum.
(3) All sample firms should be matched with a unique control firm. So if a control firm is the closest match to two sample firms, then I want the second sample firm to be matched with its second-closest control firm.
Sample Firms
Firm Year Industry Size BM Sample_Firm
A 2011 4511 8.0 0.20 1
A 2012 4511 8.7 0.21 1
B 2012 4511 8.5 0.19 1
Control Firms
Firm Year Industry Size BM Sample_firm
X 2011 4511 7.9 0.19 0
X 2012 4511 8.5 0.2 0
X 2013 4511 8.3 0.17 0
Y 2011 7826 8.7 0.21 0
Y 2012 7826 8.2 0.19 0
Y 2013 7826 8.7 0.23 0
Z 2011 4511 8 0.5 0
Z 2012 4511 8.5 0.18 0
Desired Matched firms (the output that I am trying to get)
Firm Year Industry Size BM Sample_firm
A 2011 4511 8 0.2 1
X 2011 4511 7.9 0.19 0
A 2012 4511 8.7 0.21 1
X 2012 4511 8.5 0.2 0
B 2012 4511 8.2 0.19 1
Z 2012 4511 8.5 0.18 0
Sample_firm is a dummy variable that takes value 1 for sample firms and 0 for the control firms. So in the matched firms, I want to match A_2011 with X_2011, A_2012 with X_2012, B_2012 with Z_2012.
Can someone please share a code that will help me get the desired matched firms?
Best regards,
Gokul
Here is some code for you to look at and test with. Based on the input you provided I don't yet know how restrictions 2 and 3 can be implemented. But maybe this code provides you enough to get started and maybe add some additional information that is useful for implementing the restrictions 2 and 3.
data sample_firms; input Firm $ Year Industry Size BM Sample_Firm; datalines; A 2011 4511 8.0 0.20 1 A 2012 4511 8.7 0.21 1 B 2012 4511 8.5 0.19 1 ; run; data control_firms; input Firm $ Year Industry Size BM Sample_firm; datalines; X 2011 4511 7.9 0.19 0 X 2012 4511 8.5 0.2 0 X 2013 4511 8.3 0.17 0 Y 2011 7826 8.7 0.21 0 Y 2012 7826 8.2 0.19 0 Y 2013 7826 8.7 0.23 0 Z 2011 4511 8 0.5 0 Z 2012 4511 8.5 0.18 0 ; run; /* (1) sample firm and control firm should belong to the same year and same industry. --> proc sort and merge are used to get this point done. Question: Does this seem to work as you intend? (2) the control firm should be the closest match with the sample firm based on SIZE and BM. That is, [(Sample firm SIZE – Control Firm Size) + (Sample firm BM – Control Firm BM)] = minimum. --> check 2 is used to store calculated value. Question: Is this the way you intend the calculation to work? (3) All sample firms should be matched with a unique control firm. So if a control firm is the closest match to two sample firms, then I want the second sample firm to be matched with its second-closest control firm. --> not implemented */ proc sort data=sample_firms; by year industry; run; proc sort data=control_firms; by year industry; run; data desired_match_firms_tmp; merge sample_firms(in=a) control_firms(in=b rename=(firm=cfirm size=csize bm=cbm sample_firm=csample_firm)); if a = b; check2 = (size - csize) + (bm - cbm); by year industry; run; data desired_match_firms(keep=firm year industry size bm sample_firm check2); set desired_match_firms_tmp; output; firm = cfirm; size = csize; bm = cbm; sample_firm = csample_firm; output; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.