BookmarkSubscribeRSS Feed
GokulK
Calcite | Level 5

Hello All,

 

I am a newbie and need your help with the coding. I have a huge sample (obs around 10,000) which is divided into two subsamples-- sample firms and control firms. I am trying to match each sample firm with a control firm using the following criteria:

(1) sample firm and control firm should belong to the same year and same industry.

(2) the control firm should be the closest match with the sample firm based on SIZE and BM. That is, [(Sample firm SIZE – Control Firm Size) + (Sample firm BM – Control Firm BM)] = minimum.

(3) All sample firms should be matched with a unique control firm. So if a control firm is the closest match to two sample firms, then I want the second sample firm to be matched with its second-closest control firm.

 

Sample Firms

Firm      Year   Industry      Size          BM           Sample_Firm

A          2011       4511           8.0         0.20                       1

A          2012       4511          8.7         0.21                        1

B         2012       4511           8.5         0.19                        1

 

Control Firms

Firm         Year     Industry       Size          BM                   Sample_firm

X              2011       4511           7.9         0.19                        0

X              2012      4511            8.5         0.2                          0

X              2013      4511            8.3         0.17                        0

Y              2011      7826            8.7         0.21                        0

Y              2012      7826            8.2         0.19                        0

Y              2013      7826            8.7         0.23                        0

Z               2011      4511            8            0.5                          0

Z               2012     4511             8.5         0.18                        0

 

Desired Matched firms (the output that I am trying to get)

Firm             Year            Industry             Size                             BM                      Sample_firm

A                  2011             4511                    8                               0.2                          1

X                  2011             4511                    7.9                            0.19                        0

A                  2012             4511                    8.7                            0.21                        1

X                  2012             4511                    8.5                            0.2                          0

B                  2012             4511                    8.2                            0.19                        1

Z                  2012             4511                    8.5                            0.18                        0

 

Sample_firm is a dummy variable that takes value 1 for sample firms and 0 for the control firms. So in the matched firms, I want to match A_2011 with X_2011, A_2012 with X_2012, B_2012 with Z_2012.

 

Can someone please share a code that will help me get the desired matched firms?

 

Best regards,

Gokul

1 REPLY 1
JosvanderVelden
SAS Super FREQ

Here is some code for you to look at and test with. Based on the input you provided I don't yet know how restrictions 2 and 3 can be implemented. But maybe this code provides you enough to get started and maybe add some additional information that is useful for implementing the restrictions 2 and 3.

 

data sample_firms;
 input Firm $ Year Industry Size BM Sample_Firm;
datalines;
A 2011 4511 8.0 0.20 1
A 2012 4511 8.7 0.21 1
B 2012 4511 8.5 0.19 1
;
run;


data control_firms;
 input Firm $ Year Industry Size BM Sample_firm;
datalines;
X 2011 4511 7.9 0.19 0
X 2012 4511 8.5 0.2 0
X 2013 4511 8.3 0.17 0
Y 2011 7826 8.7 0.21 0
Y 2012 7826 8.2 0.19 0
Y 2013 7826 8.7 0.23 0
Z 2011 4511 8 0.5 0
Z 2012 4511 8.5 0.18 0
;
run;

/*
(1) sample firm and control firm should belong to the same year and same industry.
    --> proc sort and merge are used to get this point done.
    Question: Does this seem to work as you intend?
(2) the control firm should be the closest match with the sample firm based on 
    SIZE and BM. That is, [(Sample firm SIZE – Control Firm Size) + 
    (Sample firm BM – Control Firm BM)] = minimum.   
    --> check 2 is used to store calculated value.
    Question: Is this the way you intend the calculation to work?
(3) All sample firms should be matched with a unique control firm. So if a control 
    firm is the closest match to two sample firms, then I want the second sample 
    firm to be matched with its second-closest control firm.
    --> not implemented
*/

proc sort data=sample_firms; by year industry; run;
proc sort data=control_firms; by year industry; run;

data desired_match_firms_tmp;
   merge sample_firms(in=a) control_firms(in=b rename=(firm=cfirm size=csize bm=cbm sample_firm=csample_firm));
   if a = b;
   check2 = (size - csize) + (bm - cbm);
   by year industry;
run;

data desired_match_firms(keep=firm year industry size bm sample_firm check2); 
   set desired_match_firms_tmp;
      output;
      firm = cfirm;
      size = csize;
      bm = cbm;
      sample_firm = csample_firm;
      output;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 671 views
  • 0 likes
  • 2 in conversation