Solved: Re: Making a new dataset (with new columns) with data from two existin...

jimmychoi · Posted 01-06-2019 09:27 AM

Hi, this is kinda tricky merge problem....

with the dataset below, which is a partnership data of two firms (e.g M&A)

what i'm trying to figure out is,

First, find the number of industry of the each firms are involved in

Second, find the number of COMMON industry of the firms.

Third, find the total number of industry both firms are involved in

<Firm partnership data>

Firm1	Firm2
Inkel	AND
Inkel	SanSumg
Sansumg	Appie

Firms' business area are stored in a different dataset, as below:

<Firm Business area>

Firm_Name	Industry
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car

Thus, with these datasets, desired output, which would be a modified version of the first dataset, would be like this:

Firm1	Firm2	Firm1_NumOfIND	Firm2_NumOfIND	NumOfCommonIND	NumOfTotalIND
Inkel	AND	3	1	1	3
Inkel	SanSumg	3	4	3	4
Sansumg	Appie	4	2	1	5

Is this even possible?

ErikLund_Jensen · Posted 01-06-2019 12:24 PM

Hi Jimmychoi

It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.

Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.


data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
	create table industrycount as
		select distinct 
			Firm_Name,
			count(*) as Firm_NumOfInd
		from businessarea
		group by Firm_Name;
quit;

* Build final table 1 - add count column for firm 1;
proc sql;
	create table firmpartner2 as 
		select 
			a.Firm1,
			a.Firm2,
			b.Firm_NumOfInd as Firm1_NumOfInd
		from firmpartner1 as a
		left join industrycount as b
		on a.Firm1 = b.Firm_Name;
quit;

* Build final table 2 - add count column for firm 2;
proc sql;
	create table firmpartner3 as 
		select 
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			b.Firm_NumOfInd as Firm2_NumOfInd
		from firmpartner2 as a
		left join industrycount as b
		on a.Firm2 = b.Firm_Name;
quit;

*********** ADD counts in common  to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
	create table commoncount1 as
		select distinct
			a.Firm_Name as Firm1,
			b.Firm_Name as Firm2,
			coalesce(a.Industry, b.Industry) as Industry
		from businessarea as a
		full outer join businessarea as b
		on a.Industry = b.Industry
		where 
			a.Firm_Name ne b.Firm_Name and
			a.Firm_name ne '' and 
			b.Firm_name ne '';
quit;

* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
	create table commoncount2 as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfCommonInd
		from commoncount1
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner4 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			b.NumOfCommonInd
		from firmpartner3 as a
		left join commoncount2 as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

*********** Add counts total to firmpartner table *************;
* Join industries on Firm1; 
proc sql;
	create table totalcount as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfTotalInd
		from (
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm1 = b.Firm_Name
			union
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm2 = b.Firm_Name
		)
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner5 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			a.NumOfCommonInd,
			b.NumOfTotalInd
		from firmpartner4 as a
		left join totalcount as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

proc print data=firmpartner5;
run;

View solution in original post

ErikLund_Jensen · Posted 01-06-2019 12:24 PM

Hi Jimmychoi

It is possible, and it can be done in several ways. In order not to complicate things I have made a simple build-up of the result table by using a lot of SQL steps. The code could be refined and made much shorter by using transpose and macro processing, but it would be more difficult to see what is going on.

Notice that in order to get it working I changed SanSumg to Sansumg in the partner table. Alternative method would have been to use lowcase(firm) in alll joins.


data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

*********** Add counts pr firm to firmpartner table *************;
* count number of industries pr. Firm;
proc sql;
	create table industrycount as
		select distinct 
			Firm_Name,
			count(*) as Firm_NumOfInd
		from businessarea
		group by Firm_Name;
quit;

* Build final table 1 - add count column for firm 1;
proc sql;
	create table firmpartner2 as 
		select 
			a.Firm1,
			a.Firm2,
			b.Firm_NumOfInd as Firm1_NumOfInd
		from firmpartner1 as a
		left join industrycount as b
		on a.Firm1 = b.Firm_Name;
quit;

* Build final table 2 - add count column for firm 2;
proc sql;
	create table firmpartner3 as 
		select 
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			b.Firm_NumOfInd as Firm2_NumOfInd
		from firmpartner2 as a
		left join industrycount as b
		on a.Firm2 = b.Firm_Name;
quit;

*********** ADD counts in common  to firmpartner table *************;
* Find all industries that one firm has in common with another firm;
proc sql;
	create table commoncount1 as
		select distinct
			a.Firm_Name as Firm1,
			b.Firm_Name as Firm2,
			coalesce(a.Industry, b.Industry) as Industry
		from businessarea as a
		full outer join businessarea as b
		on a.Industry = b.Industry
		where 
			a.Firm_Name ne b.Firm_Name and
			a.Firm_name ne '' and 
			b.Firm_name ne '';
quit;

* Count number of industries that one firm has in common with another firm;
* Includes common industries between unrelated Firms;
* Includes two rows for each combination, FirmA - FirmB and FirmB - FirmA;
proc sql;
	create table commoncount2 as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfCommonInd
		from commoncount1
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner4 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			b.NumOfCommonInd
		from firmpartner3 as a
		left join commoncount2 as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

*********** Add counts total to firmpartner table *************;
* Join industries on Firm1; 
proc sql;
	create table totalcount as
		select distinct
			Firm1,
			Firm2,
			count(*) as NumOfTotalInd
		from (
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm1 = b.Firm_Name
			union
			select 
				a.Firm1,
				a.Firm2,
				b.industry
			from firmpartner1 as a
			left join businessarea as b
			on a.Firm2 = b.Firm_Name
		)
		group by 
			Firm1,
			Firm2;
quit;

* Build final table 3 - add count column for common industries;
proc sql;
	create table firmpartner5 as 
		select
			a.Firm1,
			a.Firm2,
			a.Firm1_NumOfInd,
			a.Firm2_NumOfInd,
			a.NumOfCommonInd,
			b.NumOfTotalInd
		from firmpartner4 as a
		left join totalcount as b
		on 
			a.Firm1 = b.Firm1 and
			a.Firm2 = b.Firm2; 
quit;

proc print data=firmpartner5;
run;

novinosrin · Posted 01-06-2019 07:20 PM


data firmpartner1;
input Firm1 $ Firm2 $;
datalines;
Inkel	AND
Inkel	Sansumg
Sansumg	Appie
;

data businessarea;
input Firm_Name $ Industry $20.;
datalines;
Inkel	Semiconductor
Inkel	Cellphone
Inkel	PC
AND	Semiconductor
Sansumg	Semiconductor
Sansumg	Cellphone
Sansumg	Appliances
Sansumg	PC
Appie	Cellphone
Appie	Car
;

data w ;
if _n_=1 then do;
if 0 then set businessarea;
   declare hash H (dataset:'businessarea',multidata:'y') ;
   h.definekey  ("Firm_Name") ;
   h.definedata ("Industry") ;
   h.definedone () ;
end;
set firmpartner1;
array f(*) firm1-firm2;
array t(9) $20;
NumOfCommonIND=0;
n=0;
do i=1 to dim(f);
	vn=cats(vname(f(i)),'_NumofIND');
/*	n=1;*/
	NumOfIND=0;
	do rc=h.find(key:f(i)) by 0 while(rc=0);
		NumOfIND+1;
		if industry in t then NumOfCommonIND+1;
		else do;n+1;t(n)=industry;end;
		rc=h.find_next();
	end;
	NumOfTotalIND=dim(t)-cmiss(of t(*));
	output;
end;
keep firm1 firm2 NumOfCommonIND numofind vn NumOfTotalIND ;
run;
proc sql;
create table w2 as
select firm1, firm2,vn,numofind, max(NumOfCommonIND) as NumOfCommonIND, max(NumOfTotalIND) as NumOfTotalIND
from w
group by firm1, firm2
order by firm1,firm2,vn;
quit;
proc transpose data= w2 out=want(drop=_name_);
by firm: NumOfCommonIND NumOfTotalIND;
var NumofIND;
id vn;
run;

Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Re: Making a new dataset (with new columns) with data from two existing datasets

Registration is open

SAS Training: Just a Click Away