Hi, this is kinda tricky merge problem....
with the dataset below, which is a partnership data of two firms (e.g M&A)
what i'm trying to figure out is,
First, find the number of industry of the each firms are involved in
Second, find the number of COMMON industry of the firms.
Third, find the total number of industry both firms are involved in
<Firm partnership data>
Firm1 | Firm2 |
Inkel | AND |
Inkel | SanSumg |
Sansumg | Appie |
Firms' business area are stored in a different dataset, as below:
<Firm Business area>
Firm_Name | Industry |
Inkel | Semiconductor |
Inkel | Cellphone |
Inkel | PC |
AND | Semiconductor |
Sansumg | Semiconductor |
Sansumg | Cellphone |
Sansumg | Appliances |
Sansumg | PC |
Appie | Cellphone |
Appie | Car |
Thus, with these datasets, desired output, which would be a modified version of the first dataset, would be like this:
Firm1 | Firm2 | Firm1_NumOfIND | Firm2_NumOfIND | NumOfCommonIND | NumOfTotalIND |
Inkel | AND | 3 | 1 | 1 | 3 |
Inkel | SanSumg | 3 | 4 | 3 | 4 |
Sansumg | Appie | 4 | 2 | 1 | 5 |
Is this even possible?
Hi Jimmychoi
It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.
Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.
data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel AND
Inkel Sansumg
Sansumg Appie
;
data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel Semiconductor
Inkel Cellphone
Inkel PC
AND Semiconductor
Sansumg Semiconductor
Sansumg Cellphone
Sansumg Appliances
Sansumg PC
Appie Cellphone
Appie Car
;
*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
create table industrycount as
select distinct
Firm_Name,
count(*) as Firm_NumOfInd
from businessarea
group by Firm_Name;
quit;
* Build final table 1 - add count column for firm 1;
proc sql;
create table firmpartner2 as
select
a.Firm1,
a.Firm2,
b.Firm_NumOfInd as Firm1_NumOfInd
from firmpartner1 as a
left join industrycount as b
on a.Firm1 = b.Firm_Name;
quit;
* Build final table 2 - add count column for firm 2;
proc sql;
create table firmpartner3 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
b.Firm_NumOfInd as Firm2_NumOfInd
from firmpartner2 as a
left join industrycount as b
on a.Firm2 = b.Firm_Name;
quit;
*********** ADD counts in common to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
create table commoncount1 as
select distinct
a.Firm_Name as Firm1,
b.Firm_Name as Firm2,
coalesce(a.Industry, b.Industry) as Industry
from businessarea as a
full outer join businessarea as b
on a.Industry = b.Industry
where
a.Firm_Name ne b.Firm_Name and
a.Firm_name ne '' and
b.Firm_name ne '';
quit;
* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
create table commoncount2 as
select distinct
Firm1,
Firm2,
count(*) as NumOfCommonInd
from commoncount1
group by
Firm1,
Firm2;
quit;
* Build final table 3 - add count column for common industries;
proc sql;
create table firmpartner4 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
a.Firm2_NumOfInd,
b.NumOfCommonInd
from firmpartner3 as a
left join commoncount2 as b
on
a.Firm1 = b.Firm1 and
a.Firm2 = b.Firm2;
quit;
*********** Add counts total to firmpartner table *************;
* Join industries on Firm1;
proc sql;
create table totalcount as
select distinct
Firm1,
Firm2,
count(*) as NumOfTotalInd
from (
select
a.Firm1,
a.Firm2,
b.industry
from firmpartner1 as a
left join businessarea as b
on a.Firm1 = b.Firm_Name
union
select
a.Firm1,
a.Firm2,
b.industry
from firmpartner1 as a
left join businessarea as b
on a.Firm2 = b.Firm_Name
)
group by
Firm1,
Firm2;
quit;
* Build final table 3 - add count column for common industries;
proc sql;
create table firmpartner5 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
a.Firm2_NumOfInd,
a.NumOfCommonInd,
b.NumOfTotalInd
from firmpartner4 as a
left join totalcount as b
on
a.Firm1 = b.Firm1 and
a.Firm2 = b.Firm2;
quit;
proc print data=firmpartner5;
run;
Hi Jimmychoi
It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.
Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.
data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel AND
Inkel Sansumg
Sansumg Appie
;
data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel Semiconductor
Inkel Cellphone
Inkel PC
AND Semiconductor
Sansumg Semiconductor
Sansumg Cellphone
Sansumg Appliances
Sansumg PC
Appie Cellphone
Appie Car
;
*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
create table industrycount as
select distinct
Firm_Name,
count(*) as Firm_NumOfInd
from businessarea
group by Firm_Name;
quit;
* Build final table 1 - add count column for firm 1;
proc sql;
create table firmpartner2 as
select
a.Firm1,
a.Firm2,
b.Firm_NumOfInd as Firm1_NumOfInd
from firmpartner1 as a
left join industrycount as b
on a.Firm1 = b.Firm_Name;
quit;
* Build final table 2 - add count column for firm 2;
proc sql;
create table firmpartner3 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
b.Firm_NumOfInd as Firm2_NumOfInd
from firmpartner2 as a
left join industrycount as b
on a.Firm2 = b.Firm_Name;
quit;
*********** ADD counts in common to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
create table commoncount1 as
select distinct
a.Firm_Name as Firm1,
b.Firm_Name as Firm2,
coalesce(a.Industry, b.Industry) as Industry
from businessarea as a
full outer join businessarea as b
on a.Industry = b.Industry
where
a.Firm_Name ne b.Firm_Name and
a.Firm_name ne '' and
b.Firm_name ne '';
quit;
* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
create table commoncount2 as
select distinct
Firm1,
Firm2,
count(*) as NumOfCommonInd
from commoncount1
group by
Firm1,
Firm2;
quit;
* Build final table 3 - add count column for common industries;
proc sql;
create table firmpartner4 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
a.Firm2_NumOfInd,
b.NumOfCommonInd
from firmpartner3 as a
left join commoncount2 as b
on
a.Firm1 = b.Firm1 and
a.Firm2 = b.Firm2;
quit;
*********** Add counts total to firmpartner table *************;
* Join industries on Firm1;
proc sql;
create table totalcount as
select distinct
Firm1,
Firm2,
count(*) as NumOfTotalInd
from (
select
a.Firm1,
a.Firm2,
b.industry
from firmpartner1 as a
left join businessarea as b
on a.Firm1 = b.Firm_Name
union
select
a.Firm1,
a.Firm2,
b.industry
from firmpartner1 as a
left join businessarea as b
on a.Firm2 = b.Firm_Name
)
group by
Firm1,
Firm2;
quit;
* Build final table 3 - add count column for common industries;
proc sql;
create table firmpartner5 as
select
a.Firm1,
a.Firm2,
a.Firm1_NumOfInd,
a.Firm2_NumOfInd,
a.NumOfCommonInd,
b.NumOfTotalInd
from firmpartner4 as a
left join totalcount as b
on
a.Firm1 = b.Firm1 and
a.Firm2 = b.Firm2;
quit;
proc print data=firmpartner5;
run;
data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel AND
Inkel Sansumg
Sansumg Appie
;
data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel Semiconductor
Inkel Cellphone
Inkel PC
AND Semiconductor
Sansumg Semiconductor
Sansumg Cellphone
Sansumg Appliances
Sansumg PC
Appie Cellphone
Appie Car
;
data w ;
if _n_=1 then do;
if 0 then set businessarea;
declare hash H (dataset:'businessarea',multidata:'y') ;
h.definekey ("Firm_Name") ;
h.definedata ("Industry") ;
h.definedone () ;
end;
set firmpartner1;
array f(*) firm1-firm2;
array t(9) $20;
NumOfCommonIND=0;
n=0;
do i=1 to dim(f);
vn=cats(vname(f(i)),'_NumofIND');
/* n=1;*/
NumOfIND=0;
do rc=h.find(key:f(i)) by 0 while(rc=0);
NumOfIND+1;
if industry in t then NumOfCommonIND+1;
else do;n+1;t(n)=industry;end;
rc=h.find_next();
end;
NumOfTotalIND=dim(t)-cmiss(of t(*));
output;
end;
keep firm1 firm2 NumOfCommonIND numofind vn NumOfTotalIND ;
run;
proc sql;
create table w2 as
select firm1, firm2,vn,numofind, max(NumOfCommonIND) as NumOfCommonIND, max(NumOfTotalIND) as NumOfTotalIND
from w
group by firm1, firm2
order by firm1,firm2,vn;
quit;
proc transpose data= w2 out=want(drop=_name_);
by firm: NumOfCommonIND NumOfTotalIND;
var NumofIND;
id vn;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.