Hi Sas Master,
I have three datasets.
Dataset Have1 lists the products that several firms have in each year.
FirmID1 | Year | ProductID |
1001 | 1995 | A1 |
1001 | 1995 | B3 |
1001 | 1995 | R9 |
1001 | 1995 | V5 |
1001 | 1996 | B2 |
1001 | 1996 | Y8 |
1001 | 1997 | A1 |
1001 | 1997 | O6 |
1001 | 1997 | U3 |
1001 | 1998 | E4 |
1001 | 1998 | W1 |
1001 | 1998 | P3 |
1002 | 2003 | D5 |
1002 | 2003 | I8 |
1002 | 2003 | V5 |
1002 | 2003 | A1 |
1002 | 2004 | G7 |
1002 | 2004 | N6 |
1002 | 2005 | I8 |
1002 | 2005 | P2 |
1002 | 2005 | M5 |
1002 | 2006 | N6 |
1002 | 2006 | K9 |
1002 | 2006 | H4 |
1002 | 2007 | V5 |
1002 | 2007 | P2 |
1002 | 2007 | R4 |
Have2 lists the products that other firms have in each year.
FirmID2 | Year | ProductID |
25064 | 1995 | A1 |
25064 | 1995 | B3 |
25064 | 1995 | T12 |
25064 | 1995 | V5 |
25064 | 1995 | G8 |
25064 | 1995 | U22 |
25064 | 1995 | O91 |
25064 | 1995 | R52 |
25064 | 1995 | D2 |
25064 | 1996 | B3 |
25064 | 1996 | Y8 |
25064 | 1996 | A1 |
25064 | 1997 | B11 |
25064 | 1997 | T4 |
25064 | 1998 | E4 |
25064 | 1998 | P3 |
25064 | 1998 | A32 |
25064 | 1998 | B4 |
25064 | 1998 | V3 |
35241 | 2003 | V42 |
35241 | 2003 | H76 |
35241 | 2003 | I88 |
35241 | 2004 | G7 |
35241 | 2004 | N6 |
35241 | 2004 | J98 |
35241 | 2004 | T42 |
35241 | 2005 | A1 |
35241 | 2006 | H4 |
35241 | 2006 | L14 |
35241 | 2006 | Y72 |
35241 | 2007 | I6 |
35241 | 2007 | N9 |
Firms in Have1 and Have2 are partners and Have3 list the match for these two groups of firms.
FirmID1 | FirmID2 |
1001 | 25064 |
1002 | 35241 |
I would like to count the number of same products in each pair each year first and then this number is divided by the total number of products for firmID2. For example, firm 1001 and firm 25064 have three (A1, B3, V5) same products in 1995 and the number of products for Firm 25064 is nine in 1995. So the result will be 0.33333. The result called Want is below.
Want
FirmID1 | FirmID2 | Year | Proportion |
1001 | 25064 | 1995 | 0.333333 |
1001 | 25064 | 1996 | 0.333333 |
1001 | 25064 | 1997 | 0 |
1001 | 25046 | 1998 | 0.4 |
1002 | 35241 | 2003 | 0 |
1002 | 35241 | 2004 | 0.5 |
1002 | 35241 | 2005 | 0 |
1002 | 35241 | 2006 | 0.333333 |
1002 | 35241 | 2007 | 0 |
What code do I need to use? Thanks.
Hi @dapenDaniel Pretty straight forward admittedly boring SQL 🙂
data have1;
input FirmID1 Year ProductID $;
cards;
1001 1995 A1
1001 1995 B3
1001 1995 R9
1001 1995 V5
1001 1996 B2
1001 1996 Y8
1001 1997 A1
1001 1997 O6
1001 1997 U3
1001 1998 E4
1001 1998 W1
1001 1998 P3
1002 2003 D5
1002 2003 I8
1002 2003 V5
1002 2003 A1
1002 2004 G7
1002 2004 N6
1002 2005 I8
1002 2005 P2
1002 2005 M5
1002 2006 N6
1002 2006 K9
1002 2006 H4
1002 2007 V5
1002 2007 P2
1002 2007 R4
;
data have2;
input FirmID2 Year ProductID $;
cards;
25064 1995 A1
25064 1995 B3
25064 1995 T12
25064 1995 V5
25064 1995 G8
25064 1995 U22
25064 1995 O91
25064 1995 R52
25064 1995 D2
25064 1996 B3
25064 1996 Y8
25064 1996 A1
25064 1997 B11
25064 1997 T4
25064 1998 E4
25064 1998 P3
25064 1998 A32
25064 1998 B4
25064 1998 V3
35241 2003 V42
35241 2003 H76
35241 2003 I88
35241 2004 G7
35241 2004 N6
35241 2004 J98
35241 2004 T42
35241 2005 A1
35241 2006 H4
35241 2006 L14
35241 2006 Y72
35241 2007 I6
35241 2007 N9
;
data have3;
input FirmID1 FirmID2;
cards;
1001 25064
1002 35241
;
proc sql;
create table want as
select b.firmid1, a.firmid2,a.year,n(c.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,a.year;
quit;
Hi @dapenDaniel Pretty straight forward admittedly boring SQL 🙂
data have1;
input FirmID1 Year ProductID $;
cards;
1001 1995 A1
1001 1995 B3
1001 1995 R9
1001 1995 V5
1001 1996 B2
1001 1996 Y8
1001 1997 A1
1001 1997 O6
1001 1997 U3
1001 1998 E4
1001 1998 W1
1001 1998 P3
1002 2003 D5
1002 2003 I8
1002 2003 V5
1002 2003 A1
1002 2004 G7
1002 2004 N6
1002 2005 I8
1002 2005 P2
1002 2005 M5
1002 2006 N6
1002 2006 K9
1002 2006 H4
1002 2007 V5
1002 2007 P2
1002 2007 R4
;
data have2;
input FirmID2 Year ProductID $;
cards;
25064 1995 A1
25064 1995 B3
25064 1995 T12
25064 1995 V5
25064 1995 G8
25064 1995 U22
25064 1995 O91
25064 1995 R52
25064 1995 D2
25064 1996 B3
25064 1996 Y8
25064 1996 A1
25064 1997 B11
25064 1997 T4
25064 1998 E4
25064 1998 P3
25064 1998 A32
25064 1998 B4
25064 1998 V3
35241 2003 V42
35241 2003 H76
35241 2003 I88
35241 2004 G7
35241 2004 N6
35241 2004 J98
35241 2004 T42
35241 2005 A1
35241 2006 H4
35241 2006 L14
35241 2006 Y72
35241 2007 I6
35241 2007 N9
;
data have3;
input FirmID1 FirmID2;
cards;
1001 25064
1002 35241
;
proc sql;
create table want as
select b.firmid1, a.firmid2,a.year,n(c.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,a.year;
quit;
Thank you, @novinosrin
Hi @novinosrin ,
I have a follow-up question. Dataset Have3 has a thrid variable called year, which means the year that two companies formed the partnership.
Have3
FirmID1 FirmID2 Year
1001 25046 1997
1002 35241 2006
I would like to complete two tasks:
1) count the number of the same products by the year that they formed the partnership;
2) the number calculated in 1) is divided by the total number of products for FirmID2 by the year that they formed the partnership.
For example, 1001 and 25046 formed the partnership in 1997. They shared 4 same products (A1, B3, V5, Y8) by 1997. The total number of products for 25064 is 14 by 1997. So the ration is 4/14=0.2857
Can you tell me how to revise the code? Thanks.
Hi @dapenDaniel Wiil take a look a bit later towards the end of the day. My day has just barely begun at work. I hope it's not urgent. Hang in there
Hi @novinosrin ,
It is so nice of you! This is not urgent. I am looking forward to your reply! Thank you so much!
Hi @dapenDaniel Just had a quick look, and I seek a couple of clarifications as
This seems to actually pave way in revising the logic to do rather create a separate table as output. Am i correct? Coz I can't see how you imagine as an additional statistic in the same WANT table structure as the previous considering the grouping variable would change. To help clarify, Can you please post the wanted table structure of the new requirement in order to avoid any wrong assumptions
Hi @novinosrin ,
Sorry for the confusion.
Have3
FirmID1 FirmID2 Year
1001 25046 1997
1002 35241 2006
The new wanted table is below.
FirmID1 | FirmID2 | Year | Proportion |
1001 | 25064 | 1997 | 0.285714 |
1002 | 35241 | 2006 | 0.272727 |
1001 and 25046 formed the partnership in 1997. They shared 4 same products (A1, B3, V5, Y8) by 1997 (including 1997 and before). The total number of unique products for 25064 is 14 by 1997. So the ration is 4/14=0.285714.
1002 and 35241 formed the partnership in 2006. They shared 3 same products (A1, G7, N6) by 2006 (including 2006 and before). The total number of unique products for 35241 is 11 by 2006. So the ration is 3/11=0.272727.
Does that make more sense? Thanks.
Got it. Will work on it soon as I find sometime
HI @dapenDaniel Just a matter of carefully knowing what and how to group. See if the following works
data have1;
input FirmID1 Year ProductID $;
cards;
1001 1995 A1
1001 1995 B3
1001 1995 R9
1001 1995 V5
1001 1996 B2
1001 1996 Y8
1001 1997 A1
1001 1997 O6
1001 1997 U3
1001 1998 E4
1001 1998 W1
1001 1998 P3
1002 2003 D5
1002 2003 I8
1002 2003 V5
1002 2003 A1
1002 2004 G7
1002 2004 N6
1002 2005 I8
1002 2005 P2
1002 2005 M5
1002 2006 N6
1002 2006 K9
1002 2006 H4
1002 2007 V5
1002 2007 P2
1002 2007 R4
;
data have2;
input FirmID2 Year ProductID $;
cards;
25064 1995 A1
25064 1995 B3
25064 1995 T12
25064 1995 V5
25064 1995 G8
25064 1995 U22
25064 1995 O91
25064 1995 R52
25064 1995 D2
25064 1996 B3
25064 1996 Y8
25064 1996 A1
25064 1997 B11
25064 1997 T4
25064 1998 E4
25064 1998 P3
25064 1998 A32
25064 1998 B4
25064 1998 V3
35241 2003 V42
35241 2003 H76
35241 2003 I88
35241 2004 G7
35241 2004 N6
35241 2004 J98
35241 2004 T42
35241 2005 A1
35241 2006 H4
35241 2006 L14
35241 2006 Y72
35241 2007 I6
35241 2007 N9
;
data have3;
input FirmID1 FirmID2 year;
cards;
1001 25046 1997
1002 35241 2006
;
proc sql;
create table want(drop=t) as
select b.firmid1, a.firmid2,b.year,
a.year<=b.year as t, sum( c.productid=a.productid)/count(*) as Proportion
from have2 a left join have3 b
on a.firmid2=b.firmid2
left join
have1 c
on b.firmid1=c.firmid1
and a.year=c.year and a.productid=c.productid
group by b.firmid1,a.firmid2,b.year,t
having t;
quit;
Thanks, @novinosrin ! You are awesome! I appreciate your help!!!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.