BookmarkSubscribeRSS Feed
pk2012
Calcite | Level 5


Hi All,

I need help with creating a code.  I have two groups of companies; one is the event companies and the second is the control companies. I need to match each company from the event group with one company from the control group, where both of them should have the same year and industry, and the difference in the size for the event companies and control companies should be minimum. In other words, I want to match the event companies with the control companies based on year, industry, and size. If for some reason, the closet match for the event company is already matched with a previous event company, I want the second best control company to be matched with the event company. The second best control company is the one from same industry and year as the event company and the size difference is the second best.

 

For example

The event firms

Firm     Year       industry    size (mill)        

x          2000        20            100

x          2001        20            105

y          2000         20             100  

Z          1999         25             250  

Control firms

Firm     Year       industry    size (mill)        

               

J         1999         25             250                

L         1999         25             250

R        1999         25             325

E        1999         25             275

R        1999         25             200

C        2002         25             250

              

A          2000        20            100

T         2000        20            103                 

B          2000        20            110                

d         2002        20            99               

F        2000        20            100                  

K         2001        20            99

The output

Firm     Year       industry    size (mill)          Firm

x          2000        20            100                 A

x          2001        20            105                 K

y          2000         20            100                D

Z          1999         25             350               R

I would appreciate if someone would help me with the coding.

7 REPLIES 7
EdvinasS
Calcite | Level 5

proc sql;

create table control_event as

select firm as firm_event, year as year_event, industry as industry_event, size as size_event from event_firms s1

left join

(select firm as firm_control, year as year_control, industry as industry_control, size as size_control from control_firms) s2

on s1.year_event = s2.year_control and s1.industry_event_control = s2.industry_control and ((s1.size_event - s2.size_control < size_difference_allowed and s1.size_event - s2.size_control > 0) or (s1.size_event - s2.size_control > size_difference_allowed*(-1) and s1.size_event - s2.size_control < 0)) ;

alter table control_event drop firm_control, year_control, industry_control, size_control;

quit;

didnt quite understand mathing "second best" but this should give you all matches and you fill be able to work on that by your own Smiley Happy

pk2012
Calcite | Level 5

Thanks for the code. The "second best" matching means that I want each event company to have a unique match. Let's say, there are two event companies from the same year and same industry and have the same size. And there are five control companies from the same year and the same industry.

The event firms

Firm     Year       industry    size (mill)       

x          2000        20            100

Y         2000        20            100

Control Companies

A          2000        20            100

T           2000        20            103                

B          2000        20            110               

F          2000        20            150                 

K          2000        20            99

For both event companies, the best match is A from the control sample. However, if A is matched with X, I want K to be matched with Y. In other words, I would like the output to be

The output

Firm     Year       industry    size (mill)          Firm

x          2000        20            100                 A

y          2000        20            100                 K

pk2012
Calcite | Level 5

I ran the following code to find the match:

data event;

input firm $ year industry size ;

datalines;

x 2000 20 100

y 2000 2 100

;

run;

data match;

input _firm $ _year _industry _size ;

datalines;

A 2000 20 100

T 2000 20 103

B 2000 20 110

F 2000 20 150

K 2000 20 99

;

run;

proc sql ;

create table portfolio1 as

  select *,min(abs(ratio - _ratio)) as closest

   from event,match

    where year=_year and industry=_industry and

                group by firm

        having abs(ratio - _ratio) eq calculated closest 

         order by firm

    ;

     

quit;


The output I get is

Firm     Year       industry    size (mill)          _Firm

x          2000        20            100                 A

y          2000        20            100                 A

I need to modify the above code to make sure that the match is not duplicated.

EdvinasS
Calcite | Level 5

as far as i am concerned the problem is with joining tables. when you just select * from both tables you get the union of them. you can try joining the same datasets that you created with this code by using sql left join withaut 'on' condition. you will get all possible combinations of firm combinations and then use similar select statement as you wrote to select companies that year and industry match and size difference is minimum.

trich12
Calcite | Level 5

Hi EdvinasS. I'm trying to use your code for matches since I'm looking for several matches for each event firm.  I'm getting an error message that "year_event could not be found in the table/view identified with the correlation name S1".  I get this error for year, industry, and size after the line "on s1.year_event = s2.year_control and ..."  I've followed the code closely so I'm not sure what could be going wrong.  Do you have any idea?

Thanks for your help!

Ian_cc
Calcite | Level 5

What is the variable "size_difference_allowed"?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2408 views
  • 0 likes
  • 5 in conversation