Hello,
I have two datasets. The first one records the products that companies produce each year and the components for each product ("Categories" column is only for convenience and it is not in my real dataset). The second file is a component base for each year. It is updated annually as firms introduce more products. The second file has been sorted by year and components.
If all the components of a new product do not overlap with the existing component base, it will be considered as radical products. For example, Firm 10001 introduced AC449P in 1993 and its components were Y38 and I92, which do not exist in the component base before 1993 in file 2. Thus AC449P is a radical product.
If one of the components of a new product overlaps with the existing component base, it will be considered as incremental products. For example, Firm 10001 introduced AA386G in 1992 and its components were B30, V55, and T92. B30 and V55 exist in the component base before 1992 (both are in the component base of 1991). So AA386G is an incremental product.
I would like to know how many incremental products and radical products that each firm has for each year. The output that I want is below. In case you need the file, I also attached the excel file.
File 1
FirmID | ProductID | BeginYear | Components | Categories |
10001 | AA386G | 1992 | B30 | Incremental |
10001 | AA386G | 1992 | V55 | |
10001 | AA386G | 1992 | T92 | |
10001 | AB259T | 1993 | R26 | Incremental |
10001 | AB259T | 1993 | V55 | |
10001 | AB259T | 1993 | U38 | |
10001 | AC449P | 1993 | Y38 | radical |
10001 | AC449P | 1993 | I92 | |
10002 | CT746B | 1991 | X48 | radical |
10002 | CT746B | 1991 | K93 | |
10002 | CT746B | 1991 | O28 | |
10002 | RU239F | 1992 | A47 | Incremental |
10002 | RU239F | 1992 | Q15 | |
10002 | DH469E | 1993 | W91 | Incremental |
10002 | DH469E | 1993 | Q15 | |
10003 | CN726Q | 1991 | E55 | radical |
10003 | SZ116R | 1992 | E55 | Incremental |
10003 | SZ116R | 1992 | W87 | |
10003 | TU692E | 1992 | N67 | Incremental |
10003 | TU692E | 1992 | I92 |
File 2
Year | Components |
1990 | H32 |
1990 | N67 |
1990 | P99 |
1990 | R88 |
1991 | A47 |
1991 | B30 |
1991 | E55 |
1991 | K93 |
1991 | O28 |
1991 | Q15 |
1991 | V55 |
1991 | W87 |
1991 | X48 |
1992 | C11 |
1992 | T92 |
1992 | W91 |
1992 | Y67 |
1993 | I92 |
1993 | R26 |
1993 | S23 |
1993 | U38 |
1993 | Y38 |
Output
FirmID | BeginYear | nb_Incremental | nb_Radical |
10001 | 1992 | 1 | 0 |
10001 | 1993 | 1 | 1 |
10002 | 1991 | 0 | 1 |
10002 | 1992 | 1 | 0 |
10002 | 1993 | 1 | 0 |
10003 | 1991 | 0 | 1 |
10003 | 1992 | 2 | 0 |
data products;
infile datalines truncover dlm=' ';
input FirmID $ ProductID $ BeginYear Components $;
datalines;
10001 AA386G 1992 B30
10001 AA386G 1992 V55
10001 AA386G 1992 T92
10001 AB259T 1993 R26
10001 AB259T 1993 V55
10001 AB259T 1993 U38
10001 AC449P 1993 Y38
10001 AC449P 1993 I92
10002 CT746B 1991 X48
10002 CT746B 1991 K93
10002 CT746B 1991 O28
10002 RU239F 1992 A47
10002 RU239F 1992 Q15
10002 DH469E 1993 W91
10002 DH469E 1993 Q15
10003 CN726Q 1991 E55
10003 SZ116R 1992 E55
10003 SZ116R 1992 W87
10003 TU692E 1992 N67
10003 TU692E 1992 I92
;
data components;
infile datalines truncover dlm=' ';
input Year Components $;
datalines;
1990 H32
1990 N67
1990 P99
1990 R88
1991 A47
1991 B30
1991 E55
1991 K93
1991 O28
1991 Q15
1991 V55
1991 W87
1991 X48
1992 C11
1992 T92
1992 W91
1992 Y67
1993 I92
1993 R26
1993 S23
1993 U38
1993 Y38
;
proc sql noprint;
select min(year) into : start from components;
quit;
data temp;
if _n_=1 then do;
if 0 then set components;
declare hash h(dataset:'components',hashexp:20);
h.definekey('Year',' Components');
h.definedone();
end;
set products;
do year=&start to BeginYear-1;
if h.check()=0 then do;found=1;leave;end;
end;
drop year;
run;
proc sql;
create table temp2 as
select FirmID,BeginYear,ProductID,ifn(sum(found)>0,1,0) as flag
from temp
group by FirmID,BeginYear,ProductID ;
create table want as
select FirmID,BeginYear,sum(flag) as nb_Incremental,sum(flag=0) as nb_Radical
from temp2
group by FirmID,BeginYear ;
quit;
Here an approach using SQL.
data products;
infile datalines truncover dlm=' ';
input FirmID $ ProductID $ BeginYear Components $;
datalines;
10001 AA386G 1992 B30
10001 AA386G 1992 V55
10001 AA386G 1992 T92
10001 AB259T 1993 R26
10001 AB259T 1993 V55
10001 AB259T 1993 U38
10001 AC449P 1993 Y38
10001 AC449P 1993 I92
10002 CT746B 1991 X48
10002 CT746B 1991 K93
10002 CT746B 1991 O28
10002 RU239F 1992 A47
10002 RU239F 1992 Q15
10002 DH469E 1993 W91
10002 DH469E 1993 Q15
10003 CN726Q 1991 E55
10003 SZ116R 1992 E55
10003 SZ116R 1992 W87
10003 TU692E 1992 N67
10003 TU692E 1992 I92
;
data components;
infile datalines truncover dlm=' ';
input Year Components $;
datalines;
1990 H32
1990 N67
1990 P99
1990 R88
1991 A47
1991 B30
1991 E55
1991 K93
1991 O28
1991 Q15
1991 V55
1991 W87
1991 X48
1992 C11
1992 T92
1992 W91
1992 Y67
1993 I92
1993 R26
1993 S23
1993 U38
1993 Y38
;
proc sql;
select
FirmID,
BeginYear,
sum(radical_flg=0) as nb_Incremental,
sum(radical_flg) as nb_Radical
from
(
select
p.FirmID,
p.ProductID,
p.BeginYear,
sum(missing(c.components))=count(*) as radical_flg
from
products p left join components c
on p.components=c.components and p.BeginYear > c.Year
group by FirmID, BeginYear, ProductID
)
group by FirmID, BeginYear
;
quit;
data products;
infile datalines truncover dlm=' ';
input FirmID $ ProductID $ BeginYear Components $;
datalines;
10001 AA386G 1992 B30
10001 AA386G 1992 V55
10001 AA386G 1992 T92
10001 AB259T 1993 R26
10001 AB259T 1993 V55
10001 AB259T 1993 U38
10001 AC449P 1993 Y38
10001 AC449P 1993 I92
10002 CT746B 1991 X48
10002 CT746B 1991 K93
10002 CT746B 1991 O28
10002 RU239F 1992 A47
10002 RU239F 1992 Q15
10002 DH469E 1993 W91
10002 DH469E 1993 Q15
10003 CN726Q 1991 E55
10003 SZ116R 1992 E55
10003 SZ116R 1992 W87
10003 TU692E 1992 N67
10003 TU692E 1992 I92
;
data components;
infile datalines truncover dlm=' ';
input Year Components $;
datalines;
1990 H32
1990 N67
1990 P99
1990 R88
1991 A47
1991 B30
1991 E55
1991 K93
1991 O28
1991 Q15
1991 V55
1991 W87
1991 X48
1992 C11
1992 T92
1992 W91
1992 Y67
1993 I92
1993 R26
1993 S23
1993 U38
1993 Y38
;
proc sql noprint;
select min(year) into : start from components;
quit;
data temp;
if _n_=1 then do;
if 0 then set components;
declare hash h(dataset:'components',hashexp:20);
h.definekey('Year',' Components');
h.definedone();
end;
set products;
do year=&start to BeginYear-1;
if h.check()=0 then do;found=1;leave;end;
end;
drop year;
run;
proc sql;
create table temp2 as
select FirmID,BeginYear,ProductID,ifn(sum(found)>0,1,0) as flag
from temp
group by FirmID,BeginYear,ProductID ;
create table want as
select FirmID,BeginYear,sum(flag) as nb_Incremental,sum(flag=0) as nb_Radical
from temp2
group by FirmID,BeginYear ;
quit;
For your second question.
data temp;
if _n_=1 then do;
if 0 then set components;
declare hash h(dataset:'components',hashexp:20);
h.definekey('Year',' Components');
h.definedone();
end;
set products;
do year=BeginYear-5 to BeginYear-1; /*<------------*/
if h.check()=0 then do;found=1;leave;end;
end;
drop year;
run;
proc sql;
create table temp2 as
select FirmID,BeginYear,ProductID,ifn(sum(found)>0,1,0) as flag
from temp
group by FirmID,BeginYear,ProductID ;
create table want as
select FirmID,BeginYear,sum(flag) as nb_Incremental,sum(flag=0) as nb_Radical
from temp2
group by FirmID,BeginYear ;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.