BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hi,

 

I am working on a dataset like the one below.

 

FirmID   ProductID  BeginYear   MarketYear   Components   dummy                

10001     AB259T      1996             1998              B30 | A           0             

10001     AB259T      1996             1998              V55 | U           0           

10001    AA386G      1999             2001              T92 | B            1            

10001    AA386G      1999             2001              R26 | L            1           

10001    AA386G      1999             2001              V55 | U           1

10002    CT746B       2007            2009              U38 | P           0

10002    CT746B       2007            2009              X48 | O           0

10002    CT746B       2007            2009              K93 | E           0

10002    RU239F       2008            2010              O28 | L           0

10002    RU239F       2008            2010              A47 | W          0

10003    TY569I         2008            2010              Q15 | P          0

10003    GC992M      2009            2011              W91 | O          1

10003    GC992M      2009            2011              Q15 | P          1

 

I would like to count how many zeros and ones in each year for each firm. The expected output is below.

 

FirmID   BeginYear   nb_ones             nb_zeros

10001      1996              0                            1

10001      1999              1                            0

10002      2007              0                            1

10002      2008              0                            1

10003      2008              0                            1

10003      2009              1                            0

 

nb_ones = number of ones

nb_zeros = number of zeros

 

 

This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.

 

what code can I use? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

I'm still not entirely sure about the logic here, but I have a feeling this may be what you want

 

data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;

proc sql;
    create table want as
    select FirmID,
           BeginYear,
           count(distinct ProductID)*min(dummy=1) as nb_ones,
           count(distinct ProductID)*min(dummy=0) as nb_zeros
    from have 
    group by FirmID, BeginYear;
quit;

 

Result:

 

FirmID  ProductID  nb_ones  nb_zeros
10001   1996       0        1
10001   1999       1        0
10002   2007       0        1
10002   2008       0        1
10003   2008       0        1
10003   2009       1        0

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

I think this is what you want, even though my results differ from yours?

 

data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0 
10001 AB259T 1996 1998 V55|U 0 
10001 AA386G 1999 2001 T92|B 1 
10001 AA386G 1999 2001 R26|L 1 
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;

proc sql;
   create table want as
   select FirmID,
          BeginYear,
          sum(dummy=1) as nb_ones,
          sum(dummy=0) as nb_zeros
   from have
   group by FirmID, BeginYear;
quit;

 

Result:

 

 

FirmID BeginYear nb_ones nb_zeros 
10001  1996      0       2
10001  1999      3       0 
10002  2007      0       3 
10002  2008      0       2 
10003  2008      0       1 
10003  2009      2       0 
dapenDaniel
Obsidian | Level 7

Hi @PeterClemmensen 

 

Thanks for your reply. I think I need to explain my dataset more.

 

This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.

 

I hope it makes sense to you. 

ballardw
Super User

@dapenDaniel wrote:

Hi @PeterClemmensen 

 

Thanks for your reply. I think I need to explain my dataset more.

 

This dataset lists the products that are produced by each firm and their components. For example, Firm 10001 produced AB259T in 1996. B30 | A and V55 | U are both its components. So for firm 10001 in 1996, it only has one "zero" and no "one" for this company this year.

 

I hope it makes sense to you. 


So what is the generic rule that lets us know that two (or more things) are components? If there is nothing in the data we need another source that may allow making that determinations. If all of the components for a single productid for a year then productid would need to be summarized first then a second summary without the product id.

 

Please make your "want" data match the input. You show as desired output:

10001      2001              2                            1

but your example data does not show beginyear=2001 for Firmid=10001.

 

dapenDaniel
Obsidian | Level 7

Hi @ballardw 

 

Thanks for your reply and pointing out my error. I have revised the output.

 

For your first question, I am not sure whether I understand correctly. If several components have the same productID, these are all the components for this product. Does it make sense?

 

Thanks.

PeterClemmensen
Tourmaline | Level 20

I'm still not entirely sure about the logic here, but I have a feeling this may be what you want

 

data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0
10001 AB259T 1996 1998 V55|U 0
10001 AA386G 1999 2001 T92|B 1
10001 AA386G 1999 2001 R26|L 1
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;

proc sql;
    create table want as
    select FirmID,
           BeginYear,
           count(distinct ProductID)*min(dummy=1) as nb_ones,
           count(distinct ProductID)*min(dummy=0) as nb_zeros
    from have 
    group by FirmID, BeginYear;
quit;

 

Result:

 

FirmID  ProductID  nb_ones  nb_zeros
10001   1996       0        1
10001   1999       1        0
10002   2007       0        1
10002   2008       0        1
10003   2008       0        1
10003   2009       1        0
Ksharp
Super User
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0 
10001 AB259T 1996 1998 V55|U 0 
10001 AA386G 1999 2001 T92|B 1 
10001 AA386G 1999 2001 R26|L 1 
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;

proc sql;
   create table want as
   select FirmID,
          BeginYear,
          ifn(sum(dummy=1),1,0) as nb_ones,
          ifn(sum(dummy=0),1,0) as nb_zeros
   from have
   group by FirmID, BeginYear;
quit;

It is depended on your data.

Ksharp
Super User
data have;
input FirmID ProductID $ BeginYear MarketYear Components $ dummy;
datalines;
10001 AB259T 1996 1998 B30|A 0 
10001 AB259T 1996 1998 V55|U 0 
10001 AA386G 1999 2001 T92|B 1 
10001 AA386G 1999 2001 R26|L 1 
10001 AA386G 1999 2001 V55|U 1
10002 CT746B 2007 2009 U38|P 0
10002 CT746B 2007 2009 X48|O 0
10002 CT746B 2007 2009 K93|E 0
10002 RU239F 2008 2010 O28|L 0
10002 RU239F 2008 2010 A47|W 0
10003 TY569I 2008 2010 Q15|P 0
10003 GC992M 2009 2011 W91|O 1
10003 GC992M 2009 2011 Q15|P 1
;

proc sql;
   create table want as
   select FirmID,
          BeginYear,
          ifn(sum(dummy=1),1,0) as nb_ones,
          ifn(sum(dummy=0),1,0) as nb_zeros
   from have
   group by FirmID, BeginYear;
quit;

It is depended on your data.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1122 views
  • 0 likes
  • 4 in conversation