BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hi Sas Master,

 

I have three datasets. 

 

Dataset Have1 lists the products that several firms have in each year.

 

FirmID1YearProductID
10011995A1
10011995B3
10011995R9
10011995V5
10011996B2
10011996Y8
10011997A1
10011997O6
10011997U3
10011998E4
10011998W1
10011998P3
10022003D5
10022003I8
10022003V5
10022003A1
10022004G7
10022004N6
10022005I8
10022005P2
10022005M5
10022006N6
10022006K9
10022006H4
10022007V5
10022007P2
10022007R4

 

Have2 lists the products that other firms have in each year.

 

FirmID2YearProductID
250641995A1
250641995B3
250641995T12
250641995V5
250641995G8
250641995U22
250641995O91
250641995R52
250641995D2
250641996B3
250641996Y8
250641996A1
250641997B11
250641997T4
250641998E4
250641998P3
250641998A32
250641998B4
250641998V3
352412003V42
352412003H76
352412003I88
352412004G7
352412004N6
352412004J98
352412004T42
352412005A1
352412006H4
352412006L14
352412006Y72
352412007I6
352412007N9

 

Firms in Have1 and Have2 are partners and Have3 list the match for these two groups of firms.

 

FirmID1FirmID2
100125064
100235241

 

I would like to count the number of same products in each pair each year first and then this number is divided by the total number of products for firmID2. For example, firm 1001 and firm 25064 have three (A1, B3, V5) same products in 1995 and the number of products for Firm 25064 is nine in 1995. So the result will be 0.33333. The result called Want is below.

 

Want

FirmID1FirmID2YearProportion
10012506419950.333333
10012506419960.333333
10012506419970
10012504619980.4
10023524120030
10023524120040.5
10023524120050
10023524120060.333333
10023524120070

 

What code do I need to use? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Pretty straight forward admittedly boring SQL 🙂



data have1;
input FirmID1	Year	ProductID $;
cards;
1001	1995	A1
1001	1995	B3
1001	1995	R9
1001	1995	V5
1001	1996	B2
1001	1996	Y8
1001	1997	A1
1001	1997	O6
1001	1997	U3
1001	1998	E4
1001	1998	W1
1001	1998	P3
1002	2003	D5
1002	2003	I8
1002	2003	V5
1002	2003	A1
1002	2004	G7
1002	2004	N6
1002	2005	I8
1002	2005	P2
1002	2005	M5
1002	2006	N6
1002	2006	K9
1002	2006	H4
1002	2007	V5
1002	2007	P2
1002	2007	R4
;

data have2;
input FirmID2	Year	ProductID $;
cards;
25064	1995	A1
25064	1995	B3
25064	1995	T12
25064	1995	V5
25064	1995	G8
25064	1995	U22
25064	1995	O91
25064	1995	R52
25064	1995	D2
25064	1996	B3
25064	1996	Y8
25064	1996	A1
25064	1997	B11
25064	1997	T4
25064	1998	E4
25064	1998	P3
25064	1998	A32
25064	1998	B4
25064	1998	V3
35241	2003	V42
35241	2003	H76
35241	2003	I88
35241	2004	G7
35241	2004	N6
35241	2004	J98
35241	2004	T42
35241	2005	A1
35241	2006	H4
35241	2006	L14
35241	2006	Y72
35241	2007	I6
35241	2007	N9
;

data have3;
input FirmID1	FirmID2;
cards;
1001	25064
1002	35241
;

proc sql;
create table want as
select b.firmid1, a.firmid2,a.year,n(c.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join 
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,a.year;
quit;


 

 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Pretty straight forward admittedly boring SQL 🙂



data have1;
input FirmID1	Year	ProductID $;
cards;
1001	1995	A1
1001	1995	B3
1001	1995	R9
1001	1995	V5
1001	1996	B2
1001	1996	Y8
1001	1997	A1
1001	1997	O6
1001	1997	U3
1001	1998	E4
1001	1998	W1
1001	1998	P3
1002	2003	D5
1002	2003	I8
1002	2003	V5
1002	2003	A1
1002	2004	G7
1002	2004	N6
1002	2005	I8
1002	2005	P2
1002	2005	M5
1002	2006	N6
1002	2006	K9
1002	2006	H4
1002	2007	V5
1002	2007	P2
1002	2007	R4
;

data have2;
input FirmID2	Year	ProductID $;
cards;
25064	1995	A1
25064	1995	B3
25064	1995	T12
25064	1995	V5
25064	1995	G8
25064	1995	U22
25064	1995	O91
25064	1995	R52
25064	1995	D2
25064	1996	B3
25064	1996	Y8
25064	1996	A1
25064	1997	B11
25064	1997	T4
25064	1998	E4
25064	1998	P3
25064	1998	A32
25064	1998	B4
25064	1998	V3
35241	2003	V42
35241	2003	H76
35241	2003	I88
35241	2004	G7
35241	2004	N6
35241	2004	J98
35241	2004	T42
35241	2005	A1
35241	2006	H4
35241	2006	L14
35241	2006	Y72
35241	2007	I6
35241	2007	N9
;

data have3;
input FirmID1	FirmID2;
cards;
1001	25064
1002	35241
;

proc sql;
create table want as
select b.firmid1, a.firmid2,a.year,n(c.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join 
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,a.year;
quit;


 

 

dapenDaniel
Obsidian | Level 7

Hi @novinosrin ,

 

I have a follow-up question. Dataset Have3 has a thrid variable called year, which means the year that two companies formed the partnership.

 

Have3 

FirmID1     FirmID2         Year

1001           25046           1997

1002           35241           2006

 

I would like to complete two tasks:

1) count the number of the same products by the year that they formed the partnership;

2) the number calculated in 1) is divided by the total number of products for FirmID2 by the year that they formed the partnership.

 

For example, 1001 and 25046 formed the partnership in 1997. They shared 4 same products (A1, B3, V5, Y8) by 1997. The total number of products for 25064 is 14 by 1997. So the ration is 4/14=0.2857

 

Can you tell me how to revise the code? Thanks.

novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Wiil take a look a bit later towards the end of the day. My day has just barely begun at work. I hope it's not urgent. Hang in there

dapenDaniel
Obsidian | Level 7

Hi @novinosrin ,

 

It is so nice of you! This is not urgent. I am looking forward to your reply! Thank you so much!

 

novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Just had a quick look, and I seek a couple of clarifications as

 

 This seems to actually  pave way in revising the logic to do rather create a separate table as output. Am i correct? Coz I can't see how you imagine as an additional statistic in the same WANT table structure as the previous considering the grouping variable would change.  To help clarify, Can you please post the wanted table structure of the new requirement in order to avoid any wrong assumptions 

dapenDaniel
Obsidian | Level 7

Hi @novinosrin ,

 

Sorry for the confusion.

 

Have3

FirmID1     FirmID2         Year

1001           25046           1997

1002           35241           2006

 

The new wanted table is below.

 

FirmID1FirmID2YearProportion
10012506419970.285714
10023524120060.272727

 

1001 and 25046 formed the partnership in 1997. They shared 4 same products (A1, B3, V5, Y8) by 1997 (including 1997 and before). The total number of unique products for 25064 is 14 by 1997. So the ration is 4/14=0.285714.

 

1002 and 35241 formed the partnership in 2006. They shared 3 same products (A1, G7, N6) by 2006 (including 2006 and before). The total number of unique products for 35241 is 11 by 2006. So the ration is 3/11=0.272727.

 

Does that make more sense? Thanks.

novinosrin
Tourmaline | Level 20

Got it. Will work on it soon as I find sometime

novinosrin
Tourmaline | Level 20

HI @dapenDaniel  Just a matter of carefully knowing what and how to group. See if the following works

 



   
data have1;
input FirmID1	Year	ProductID $;
cards;
1001	1995	A1
1001	1995	B3
1001	1995	R9
1001	1995	V5
1001	1996	B2
1001	1996	Y8
1001	1997	A1
1001	1997	O6
1001	1997	U3
1001	1998	E4
1001	1998	W1
1001	1998	P3
1002	2003	D5
1002	2003	I8
1002	2003	V5
1002	2003	A1
1002	2004	G7
1002	2004	N6
1002	2005	I8
1002	2005	P2
1002	2005	M5
1002	2006	N6
1002	2006	K9
1002	2006	H4
1002	2007	V5
1002	2007	P2
1002	2007	R4
;

data have2;
input FirmID2	Year	ProductID $;
cards;
25064	1995	A1
25064	1995	B3
25064	1995	T12
25064	1995	V5
25064	1995	G8
25064	1995	U22
25064	1995	O91
25064	1995	R52
25064	1995	D2
25064	1996	B3
25064	1996	Y8
25064	1996	A1
25064	1997	B11
25064	1997	T4
25064	1998	E4
25064	1998	P3
25064	1998	A32
25064	1998	B4
25064	1998	V3
35241	2003	V42
35241	2003	H76
35241	2003	I88
35241	2004	G7
35241	2004	N6
35241	2004	J98
35241	2004	T42
35241	2005	A1
35241	2006	H4
35241	2006	L14
35241	2006	Y72
35241	2007	I6
35241	2007	N9
;

data have3;
input FirmID1	FirmID2 year;
cards;
1001           25046           1997
1002           35241           2006
;


proc sql;
create table want(drop=t) as
select b.firmid1, a.firmid2,b.year,
a.year<=b.year as t, sum( c.productid=a.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join 
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,b.year,t
having t;
quit;
dapenDaniel
Obsidian | Level 7

Thanks, @novinosrin ! You are awesome! I appreciate your help!!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 776 views
  • 0 likes
  • 2 in conversation