Hi,
I have two datasets below. In Have1, it lists all products for one industry. Pid is the product ID. Year is the year that the product was introduced. Components are the components of each product.
data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
run;
In have2, Pid is the product ID, Fid is the firm that produces the product. Year is the year that this product was introduced and Components are the components in this product.
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15
DD 10001 1997 B15
DD 10001 1997 C36
EE 10002 1999 D14
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
run;
I would like to know whether the component combination exists or not compared with its prvious products in the industry.
If the component combinations exist in the products that were introduced in the preceding 5 years, the flag is 0. Otherwise, the flag is 1. For example, the component combination of DD (A15, B15, C36) has been presented in previous product AA (A15, B15, C36, F25). So the flag is 0. The component combination of EE (D14, H67) has not shown in previous products, so the flag is 1.
The data I want is below.
Fid Year Pid Flag
10001 1997 DD 0
10002 1999 EE 1
10003 2007 FF 0
10003 2008 HH 1
What program do I need to use? Thanks.
@dapenDaniel Thank you for the fast response. So, is it 1992 to 1997 inclusive or 1993 to 1996 excluding 92,97
a.year-5<=b.year<=a.year /*this is inclusive*/
a.year-5<b.year<a.year/*this is exclusive*/
Assuming my understanding is correct, I am going with a.year-5<=b.year<=a.year /*this is inclusive*/ with the revised modified sample HAVE1 and HAVE2
data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
;
run;
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15
DD 10001 1997 B15
DD 10001 1997 C36
EE 10002 1999 D14
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
;
run;
proc sql;
create table want as
select a.Fid , a.Year, a.Pid, ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components and a.year-5<=b.year<=a.year
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
Fid | Year | PID | Flag |
---|---|---|---|
10001 | 1997 | DD | 0 |
10002 | 1999 | EE | 1 |
10003 | 2007 | FF | 0 |
10003 | 2008 | HH | 1 |
Please do test thoroughly . Thanks!
Hi @dapenDaniel Not sure if I understand the requirement thoroughly well, though taking a chance with some assumption
data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
run;
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15
DD 10001 1997 B15
DD 10001 1997 C36
EE 10002 1999 D14
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
;
proc sql;
create table want as
select a.Fid , a.Year, a.Pid, ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
Hi, @novinosrin Thanks for your reply. One question to your program, is it only look at the products in the previous 5 years? For example, DD was introduced in 1997 and I want to check whether its component combination exists between 1992 to 1996. Thanks.
Hi @dapenDaniel Good catch. No it doesn't. I will modify that. Before that, Can you modify the sample with a couple of records that will not qualify for testing purpose.Also, please be aware it's 5:15 where I am, so I might head home shortly but I am sure somebody else will, nonethless I will see if I can login from home.
Hi @novinosrin HH is the case. The components of HH were introduced in 2008 but its component combination exists in 1996. So its flag is still 1.
@dapenDaniel Thank you for the fast response. So, is it 1992 to 1997 inclusive or 1993 to 1996 excluding 92,97
a.year-5<=b.year<=a.year /*this is inclusive*/
a.year-5<b.year<a.year/*this is exclusive*/
Assuming my understanding is correct, I am going with a.year-5<=b.year<=a.year /*this is inclusive*/ with the revised modified sample HAVE1 and HAVE2
data have1;
input PID $ Year Components $;
datalines;
AA 1995 A15
AA 1995 B15
AA 1995 C36
AA 1995 F25
BB 1996 D14
BB 1996 E41
CC 2005 U14
CC 2005 T11
CC 2005 K14
GG 2005 A15
GG 2005 B15
;
run;
data have2;
input PID $ Fid $ Year Components $;
datalines;
DD 10001 1997 A15
DD 10001 1997 B15
DD 10001 1997 C36
EE 10002 1999 D14
EE 10002 1999 H67
FF 10003 2007 U14
FF 10003 2007 T11
FF 10003 2007 K14
HH 10003 2008 D14
HH 10003 2008 E41
;
run;
proc sql;
create table want as
select a.Fid , a.Year, a.Pid, ^max(t) as Flag
from
(select a.Fid , a.Year, a.Pid, c=count(distinct b.components) as t
from
(select *,count(distinct components) as c from have2 group by PID, Fid,Year) a left join have1 b
on a.components=b.components and a.year-5<=b.year<=a.year
group by a.Fid , a.Year, a.Pid,b.pid)
group by a.Fid , a.Year, a.Pid;
quit;
Fid | Year | PID | Flag |
---|---|---|---|
10001 | 1997 | DD | 0 |
10002 | 1999 | EE | 1 |
10003 | 2007 | FF | 0 |
10003 | 2008 | HH | 1 |
Please do test thoroughly . Thanks!
data have1; input PID $ Year Components $; datalines; AA 1995 A15 AA 1995 B15 AA 1995 C36 AA 1995 F25 BB 1996 D14 BB 1996 E41 CC 2005 U14 CC 2005 T11 CC 2005 K14 GG 2005 A15 GG 2005 B15 ; data have2; input PID $ Fid $ Year Components $; datalines; DD 10001 1997 A15 DD 10001 1997 B15 DD 10001 1997 C36 EE 10002 1999 D14 EE 10002 1999 H67 FF 10003 2007 U14 FF 10003 2007 T11 FF 10003 2007 K14 HH 10003 2008 D14 HH 10003 2008 E41 ; data temp; if _n_=1 then do; if 0 then set have1; declare hash h(dataset:'have1',hashexp:20); h.definekey('Year','Components'); h.definedone(); end; set have2; found=0; do _year=Year-5 to Year-1; if h.check(key:_year, key:Components)=0 then do;found=1;leave;end; end; drop _year; run; proc sql; create table want as select FID,PID,Year , (sum(found)/count(*)) ne 1 as flag from temp group by FID,PID,Year ; quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.