BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hi,

 

I have two datasets below. In Have1, it lists all products for one industry. Pid is the product ID. Year is the year that the product was introduced. Components are the components of each product.

data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15 
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14 
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
run;

 

In have2, Pid is the product ID, Fid is the firm that produces the product. Year is the year that this product was introduced and Components are the components in this product.

data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15 
DD 10001 1997 B15
DD 10001 1997 C36 
EE 10002 1999 D14 
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
run;

I would like to know whether the component combination exists or not compared with its prvious products in the industry.

 

If the component combinations exist in the products that were introduced in the preceding 5 years, the flag is 0. Otherwise, the flag is 1. For example, the component combination of DD (A15, B15, C36) has been presented in previous product AA (A15, B15, C36, F25). So the flag is 0. The component combination of EE (D14, H67) has not shown in previous products, so the flag is 1.

The data I want is below.

 

Fid          Year       Pid     Flag

10001     1997      DD       0

10002     1999      EE       1

10003     2007      FF        0

10003     2008      HH       1

 

 

What program do I need to use? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@dapenDaniel  Thank you for the fast response. So, is it  1992 to 1997 inclusive or 1993 to 1996 excluding 92,97

 


a.year-5<=b.year<=a.year  /*this is inclusive*/

a.year-5<b.year<a.year/*this is exclusive*/

 

 

 

Assuming my understanding is correct, I am going with  a.year-5<=b.year<=a.year  /*this is inclusive*/ with the revised modified sample HAVE1 and HAVE2

 

data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15 
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14 
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
;
run;
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15 
DD 10001 1997 B15
DD 10001 1997 C36 
EE 10002 1999 D14 
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
;
run;


proc sql;
create table want as
select a.Fid , a.Year, a.Pid,  ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from 
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components and a.year-5<=b.year<=a.year
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
Fid Year PID Flag
10001 1997 DD 0
10002 1999 EE 1
10003 2007 FF 0
10003 2008 HH 1

 

Please do test thoroughly  . Thanks!

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Not sure if I understand the requirement thoroughly well, though taking a chance with some assumption

 


data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15 
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14 
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
run;

data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15 
DD 10001 1997 B15
DD 10001 1997 C36 
EE 10002 1999 D14 
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
;
proc sql;
create table want as
select a.Fid , a.Year, a.Pid,  ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from 
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components 
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
dapenDaniel
Obsidian | Level 7

Hi, @novinosrin Thanks for your reply. One question to your program, is it only look at the products in the previous 5 years? For example, DD was introduced in 1997 and I want to check whether its component combination exists between 1992 to 1996. Thanks.

novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  Good catch. No it doesn't. I will modify that. Before that, Can you modify the sample with a couple of records that will not qualify for testing purpose.Also, please be aware it's 5:15 where I am, so I might head home shortly but I am sure somebody else will, nonethless I will see if I can login from home. 

dapenDaniel
Obsidian | Level 7

Hi @novinosrin HH is the case. The components of HH were introduced in 2008 but its component combination exists in 1996. So its flag is still 1. 

novinosrin
Tourmaline | Level 20

@dapenDaniel  Thank you for the fast response. So, is it  1992 to 1997 inclusive or 1993 to 1996 excluding 92,97

 


a.year-5<=b.year<=a.year  /*this is inclusive*/

a.year-5<b.year<a.year/*this is exclusive*/

 

 

 

Assuming my understanding is correct, I am going with  a.year-5<=b.year<=a.year  /*this is inclusive*/ with the revised modified sample HAVE1 and HAVE2

 

data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15 
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14 
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
;
run;
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15 
DD 10001 1997 B15
DD 10001 1997 C36 
EE 10002 1999 D14 
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
;
run;


proc sql;
create table want as
select a.Fid , a.Year, a.Pid,  ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from 
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components and a.year-5<=b.year<=a.year
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
Fid Year PID Flag
10001 1997 DD 0
10002 1999 EE 1
10003 2007 FF 0
10003 2008 HH 1

 

Please do test thoroughly  . Thanks!

Reeza
Super User
You may want to ask your company to get the SAS/OR module, your questions relate to Bill of Materials analysis, which OR is designed to handle more easily than manually doing this analysis.
Ksharp
Super User
data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15 
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14 
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
;


data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15 
DD 10001 1997 B15
DD 10001 1997 C36 
EE 10002 1999 D14 
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
;


data temp;
if _n_=1 then do;
 if 0 then set have1;
 declare hash h(dataset:'have1',hashexp:20);
 h.definekey('Year','Components');
 h.definedone();
end;
 set have2;
 found=0;
 do _year=Year-5 to Year-1;
   if h.check(key:_year, key:Components)=0 then do;found=1;leave;end;
 end;
drop _year;
run;


proc sql;
create table want as
select 	FID,PID,Year , (sum(found)/count(*)) ne 1 as flag
 from temp
  group by FID,PID,Year ;
quit;



sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1025 views
  • 0 likes
  • 4 in conversation