Hi,
I am working on a dataset like the one below.
FirmID ProductID BeginYear MarketYear Components dummy
10001 AB259T 1996 1998 B30 | A 0
10001 AB259T 1996 1998 V55 | U 0
10001 AA386G 1999 2001 T92 | B 1
10001 AA386G 1999 2001 R26 | L 1
10001 AA386G 1999 2001 V55 | U 1
10002 CT746B 2007 2009 U38 | P 0
10002 CT746B 2007 2009 X48 | O 0
10002 CT746B 2007 2009 K93 | E 0
10002 RU239F 2008 2010 O28 | L 0
10002 RU239F 2008 2010 A47 | W 0
10003 TY569I 2008 2010 Q15 | P 0
10003 GC992M 2009 2011 W91 | O 1
10003 GC992M 2009 2011 Q15 | P 1
I would like to count how many zeros and ones in each year for each firm. The expected output is below.
FirmID BeginYear nb_ones nb_zeros
10001 1996 0 1
10001 1999 1 0
10002 2007 0 1
10002 2008 0 1
10003 2008 0 1
10003 2009 1 0
nb_ones = number of ones
nb_zeros = number of zeros
This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.
what code can I use? Thanks!
I'm still not entirely sure about the logic here, but I have a feeling this may be what you want
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;
proc sql;
create table want as
select FirmID,
BeginYear,
count(distinct ProductID)*min(dummy=1) as nb_ones,
count(distinct ProductID)*min(dummy=0) as nb_zeros
from have
group by FirmID, BeginYear;
quit;
Result:
FirmID ProductID nb_ones nb_zeros 10001 1996 0 1 10001 1999 1 0 10002 2007 0 1 10002 2008 0 1 10003 2008 0 1 10003 2009 1 0
I think this is what you want, even though my results differ from yours?
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;
proc sql;
create table want as
select FirmID,
BeginYear,
sum(dummy=1) as nb_ones,
sum(dummy=0) as nb_zeros
from have
group by FirmID, BeginYear;
quit;
Result:
FirmID BeginYear nb_ones nb_zeros 10001 1996 0 2 10001 1999 3 0 10002 2007 0 3 10002 2008 0 2 10003 2008 0 1 10003 2009 2 0
Thanks for your reply. I think I need to explain my dataset more.
This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.
I hope it makes sense to you.
@dapenDaniel wrote:
Thanks for your reply. I think I need to explain my dataset more.
This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.
I hope it makes sense to you.
So what is the generic rule that lets us know that two (or more things) are components? If there is nothing in the data we need another source that may allow making that determinations. If all of the components for a single productid for a year then productid would need to be summarized first then a second summary without the product id.
Please make your "want" data match the input. You show as desired output:
10001 2001 2 1
but your example data does not show beginyear=2001 for Firmid=10001.
Hi @ballardw
Thanks for your reply and pointing out my error. I have revised the output.
For your first question, I am not sure whether I understand correctly. If several components have the same productID, these are all the components for this product. Does it make sense?
Thanks.
I'm still not entirely sure about the logic here, but I have a feeling this may be what you want
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;
proc sql;
create table want as
select FirmID,
BeginYear,
count(distinct ProductID)*min(dummy=1) as nb_ones,
count(distinct ProductID)*min(dummy=0) as nb_zeros
from have
group by FirmID, BeginYear;
quit;
Result:
FirmID ProductID nb_ones nb_zeros 10001 1996 0 1 10001 1999 1 0 10002 2007 0 1 10002 2008 0 1 10003 2008 0 1 10003 2009 1 0
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;
proc sql;
create table want as
select FirmID,
BeginYear,
ifn(sum(dummy=1),1,0) as nb_ones,
ifn(sum(dummy=0),1,0) as nb_zeros
from have
group by FirmID, BeginYear;
quit;
It is depended on your data.
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;
proc sql;
create table want as
select FirmID,
BeginYear,
ifn(sum(dummy=1),1,0) as nb_ones,
ifn(sum(dummy=0),1,0) as nb_zeros
from have
group by FirmID, BeginYear;
quit;
It is depended on your data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.