BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

What is the data step way to count Accumulate number of unique values?

For example:

Id    Z1       Z2

1      A         1

1      B         1

1      A         2

1      D         4

2      S         3

2      A         1

3      A         1

Wanted result is :

Id    Z1       Z2    Accum_distinct_Z1  Accum_distinct_Z2

1      A         1            1                              1        

1      B         1            2                              1

1      A         2            2                              2

1      D         4           3                              3

2      S         3           1                              1

2      A         1            2                             2

3      A         1            1                             1

 

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello,

 

This is one way to do it:

data abc;
Length Id $ 1 Z1 $ 1 Z2 $ 1;
input  Id $   Z1 $   Z2 $;
cards;
1      A         1
1      B         1
1      A         2
1      D         4
2      S         3
2      A         1
3      A         1
;
run;

data def(drop=accumZ:);
Length accumZ1 accumZ2 $ 100;
 set abc;
 retain accumZ1 '' accumZ2 '';
 by Id;
 if first.Id then do; accumZ1=''         ; accumZ2=''; 
                      Accum_distinct_Z1=0; Accum_distinct_Z2=0; end;
 if indexc(accumZ1,Z1)=0 then Accum_distinct_Z1+1;
 if indexc(accumZ2,Z2)=0 then Accum_distinct_Z2+1;
 accumZ1 = strip(accumZ1)!!strip(Z1);
 accumZ2 = strip(accumZ2)!!strip(Z2);
run;
/* end of program */

Cheers,

Koen

 

mkeintz
PROC Star

This is a good use case for hash objects, which can store distinct values of the hash key in memory.  In this case one hash object (hz1) stores distinct values of z1, the other (hz2) does the same for z2:

 

data have;
  input id Z1 $    Z2;
datalines;
1      A         1
1      B         1
1      A         2
1      D         4
2      S         3
2      A         1
3      A         1
run;

data want;
  set have;
  by id;
  if _n_=1 then do;
    declare hash hz1 ();
      hz1.definekey('Z1');
      hz1.definedone();
    declare hash hz2 ();
      hz2.definekey('Z2');
      hz2.definedone();
  end;
  if first.id then do;
    hz1.clear();
    hz2.clear();
  end;
  hz1.replace();
  hz2.replace();
  accum_distinct_z1=hz1.num_items;
  accum_distinct_z2=hz2.num_items;
run;

One attribute of a hash object is the "num_items" attribute, which tracks the number of distinct key values, so a simple assignment to accum_distinct_z1 (and ..._z2) works fine.   Just remember to clear the hashes at the start of each id.

 

Another benefit of this approach is that the distinct values can be character values of any length, or can be numeric values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 997 views
  • 3 likes
  • 3 in conversation