Hi All,
I need help with creating a code. I have two groups of companies; one is the event companies and the second is the control companies. I need to match each company from the event group with one company from the control group, where both of them should have the same year and industry, and the difference in the size for the event companies and control companies should be minimum. In other words, I want to match the event companies with the control companies based on year, industry, and size. If for some reason, the closet match for the event company is already matched with a previous event company, I want the second best control company to be matched with the event company. The second best control company is the one from same industry and year as the event company and the size difference is the second best.
For example
The event firms
Firm Year industry size (mill)
x 2000 20 100
x 2001 20 105
y 2000 20 100
Z 1999 25 250
Control firms
Firm Year industry size (mill)
J 1999 25 250
L 1999 25 250
R 1999 25 325
E 1999 25 275
R 1999 25 200
C 2002 25 250
A 2000 20 100
T 2000 20 103
B 2000 20 110
d 2002 20 99
F 2000 20 100
K 2001 20 99
The output
Firm Year industry size (mill) Firm
x 2000 20 100 A
x 2001 20 105 K
y 2000 20 100 D
Z 1999 25 350 R
I would appreciate if someone would help me with the coding.
proc sql;
create table control_event as
select firm as firm_event, year as year_event, industry as industry_event, size as size_event from event_firms s1
left join
(select firm as firm_control, year as year_control, industry as industry_control, size as size_control from control_firms) s2
on s1.year_event = s2.year_control and s1.industry_event_control = s2.industry_control and ((s1.size_event - s2.size_control < size_difference_allowed and s1.size_event - s2.size_control > 0) or (s1.size_event - s2.size_control > size_difference_allowed*(-1) and s1.size_event - s2.size_control < 0)) ;
alter table control_event drop firm_control, year_control, industry_control, size_control;
quit;
didnt quite understand mathing "second best" but this should give you all matches and you fill be able to work on that by your own
Thanks for the code. The "second best" matching means that I want each event company to have a unique match. Let's say, there are two event companies from the same year and same industry and have the same size. And there are five control companies from the same year and the same industry.
The event firms
Firm Year industry size (mill)
x 2000 20 100
Y 2000 20 100
Control Companies
A 2000 20 100
T 2000 20 103
B 2000 20 110
F 2000 20 150
K 2000 20 99
For both event companies, the best match is A from the control sample. However, if A is matched with X, I want K to be matched with Y. In other words, I would like the output to be
The output
Firm Year industry size (mill) Firm
x 2000 20 100 A
y 2000 20 100 K
I ran the following code to find the match:
data event;
input firm $ year industry size ;
datalines;
x 2000 20 100
y 2000 2 100
;
run;
data match;
input _firm $ _year _industry _size ;
datalines;
A 2000 20 100
T 2000 20 103
B 2000 20 110
F 2000 20 150
K 2000 20 99
;
run;
proc sql ;
create table portfolio1 as
select *,min(abs(ratio - _ratio)) as closest
from event,match
where year=_year and industry=_industry and
group by firm
having abs(ratio - _ratio) eq calculated closest
order by firm
;
quit;
The output I get is
Firm Year industry size (mill) _Firm
x 2000 20 100 A
y 2000 20 100 A
I need to modify the above code to make sure that the match is not duplicated.
as far as i am concerned the problem is with joining tables. when you just select * from both tables you get the union of them. you can try joining the same datasets that you created with this code by using sql left join withaut 'on' condition. you will get all possible combinations of firm combinations and then use similar select statement as you wrote to select companies that year and industry match and size difference is minimum.
Hi EdvinasS. I'm trying to use your code for matches since I'm looking for several matches for each event firm. I'm getting an error message that "year_event could not be found in the table/view identified with the correlation name S1". I get this error for year, industry, and size after the line "on s1.year_event = s2.year_control and ..." I've followed the code closely so I'm not sure what could be going wrong. Do you have any idea?
Thanks for your help!
What is the variable "size_difference_allowed"?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.