BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimmychoi
Obsidian | Level 7

Hi, this is kinda tricky merge problem....

with the dataset below, which is a partnership data of two firms (e.g M&A)

what i'm trying to figure out is,

  First, find the number of industry of the each firms are involved in

  Second, find the number of COMMON industry of the firms.

  Third, find the total number of industry both firms are involved in

 

<Firm partnership data>

Firm1  Firm2
InkelAND
InkelSanSumg
SansumgAppie

 

Firms' business area are stored in a different dataset, as below:

 

<Firm Business area>

Firm_NameIndustry
InkelSemiconductor
InkelCellphone
InkelPC
ANDSemiconductor
SansumgSemiconductor
SansumgCellphone
SansumgAppliances
SansumgPC
AppieCellphone
AppieCar

 

 

 

Thus, with these datasets, desired output, which would be a modified version of the first dataset, would be like this:

Firm1Firm2Firm1_NumOfINDFirm2_NumOfINDNumOfCommonINDNumOfTotalIND
InkelAND3113
InkelSanSumg3434
SansumgAppie4215

 

 

Is this even possible?

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi Jimmychoi

 

It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.

 

Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.

 


data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
	create table industrycount as
		select distinct 
			Firm_Name,
			count(*) as Firm_NumOfInd
		from businessarea
		group by Firm_Name;
quit;

* Build final table 1 - add count column for firm 1;
proc sql;
	create table firmpartner2 as 
		select 
			a.Firm1,
			a.Firm2,
			b.Firm_NumOfInd as Firm1_NumOfInd
		from firmpartner1 as a
		left join industrycount as b
		on a.Firm1 = b.Firm_Name;
quit;

* Build final table 2 - add count column for firm 2;
proc sql;
	create table firmpartner3 as 
		select 
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			b.Firm_NumOfInd as Firm2_NumOfInd
		from firmpartner2 as a
		left join industrycount as b
		on a.Firm2 = b.Firm_Name;
quit;

*********** ADD counts in common  to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
	create table commoncount1 as
		select distinct
			a.Firm_Name as Firm1,
			b.Firm_Name as Firm2,
			coalesce(a.Industry, b.Industry) as Industry
		from businessarea as a
		full outer join businessarea as b
		on a.Industry = b.Industry
		where 
			a.Firm_Name ne b.Firm_Name and
			a.Firm_name ne '' and 
			b.Firm_name ne '';
quit;

* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
	create table commoncount2 as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfCommonInd
		from commoncount1
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner4 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			b.NumOfCommonInd
		from firmpartner3 as a
		left join commoncount2 as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

*********** Add counts total to firmpartner table *************;
* Join industries on Firm1; 
proc sql;
	create table totalcount as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfTotalInd
		from (
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm1 = b.Firm_Name
			union
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm2 = b.Firm_Name
		)
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner5 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			a.NumOfCommonInd,
			b.NumOfTotalInd
		from firmpartner4 as a
		left join totalcount as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

proc print data=firmpartner5;
run;

2019-01-06 18_22_35-Clipboard.png

View solution in original post

2 REPLIES 2
ErikLund_Jensen
Rhodochrosite | Level 12

Hi Jimmychoi

 

It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.

 

Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.

 


data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
	create table industrycount as
		select distinct 
			Firm_Name,
			count(*) as Firm_NumOfInd
		from businessarea
		group by Firm_Name;
quit;

* Build final table 1 - add count column for firm 1;
proc sql;
	create table firmpartner2 as 
		select 
			a.Firm1,
			a.Firm2,
			b.Firm_NumOfInd as Firm1_NumOfInd
		from firmpartner1 as a
		left join industrycount as b
		on a.Firm1 = b.Firm_Name;
quit;

* Build final table 2 - add count column for firm 2;
proc sql;
	create table firmpartner3 as 
		select 
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			b.Firm_NumOfInd as Firm2_NumOfInd
		from firmpartner2 as a
		left join industrycount as b
		on a.Firm2 = b.Firm_Name;
quit;

*********** ADD counts in common  to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
	create table commoncount1 as
		select distinct
			a.Firm_Name as Firm1,
			b.Firm_Name as Firm2,
			coalesce(a.Industry, b.Industry) as Industry
		from businessarea as a
		full outer join businessarea as b
		on a.Industry = b.Industry
		where 
			a.Firm_Name ne b.Firm_Name and
			a.Firm_name ne '' and 
			b.Firm_name ne '';
quit;

* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
	create table commoncount2 as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfCommonInd
		from commoncount1
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner4 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			b.NumOfCommonInd
		from firmpartner3 as a
		left join commoncount2 as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

*********** Add counts total to firmpartner table *************;
* Join industries on Firm1; 
proc sql;
	create table totalcount as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfTotalInd
		from (
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm1 = b.Firm_Name
			union
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm2 = b.Firm_Name
		)
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner5 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			a.NumOfCommonInd,
			b.NumOfTotalInd
		from firmpartner4 as a
		left join totalcount as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

proc print data=firmpartner5;
run;

2019-01-06 18_22_35-Clipboard.png

novinosrin
Tourmaline | Level 20

data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

data w ;
if _n_=1 then do;
if 0 then set businessarea;
   declare hash H (dataset:'businessarea',multidata:'y') ;
   h.definekey  ("Firm_Name") ;
   h.definedata ("Industry") ;
   h.definedone () ;
end;
set firmpartner1;
array f(*) firm1-firm2;
array t(9) $20;
NumOfCommonIND=0;
n=0;
do i=1 to dim(f);
	vn=cats(vname(f(i)),'_NumofIND');
/*	n=1;*/
	NumOfIND=0;
	do rc=h.find(key:f(i)) by 0 while(rc=0);
		NumOfIND+1;
		if industry in t then NumOfCommonIND+1;
		else do;n+1;t(n)=industry;end;
		rc=h.find_next();
	end;
	NumOfTotalIND=dim(t)-cmiss(of t(*));
	output;
end;
keep firm1 firm2 NumOfCommonIND numofind vn NumOfTotalIND ;
run;
proc sql;
create table w2 as
select firm1, firm2,vn,numofind, max(NumOfCommonIND) as NumOfCommonIND, max(NumOfTotalIND) as NumOfTotalIND
from w
group by firm1, firm2
order by firm1,firm2,vn;
quit;
proc transpose data= w2 out=want(drop=_name_);
by firm: NumOfCommonIND NumOfTotalIND;
var NumofIND;
id vn;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 708 views
  • 4 likes
  • 3 in conversation