Count Accumulate number of unique values-Data step

Ronein · Posted 03-27-2021 09:22 AM

Hello

What is the data step way to count Accumulate number of unique values?

For example:

Id Z1 Z2

1 A 1

1 B 1

1 A 2

1 D 4

2 S 3

2 A 1

3 A 1

Wanted result is :

Id Z1 Z2 Accum_distinct_Z1 Accum_distinct_Z2

1 A 1 1 1

1 B 1 2 1

1 A 2 2 2

1 D 4 3 3

2 S 3 1 1

2 A 1 2 2

3 A 1 1 1

sbxkoenk · Posted 03-27-2021 10:15 AM

Hello,

This is one way to do it:

data abc;
Length Id $ 1 Z1 $ 1 Z2 $ 1;
input  Id $   Z1 $   Z2 $;
cards;
1      A         1
1      B         1
1      A         2
1      D         4
2      S         3
2      A         1
3      A         1
;
run;

data def(drop=accumZ:);
Length accumZ1 accumZ2 $ 100;
 set abc;
 retain accumZ1 '' accumZ2 '';
 by Id;
 if first.Id then do; accumZ1=''         ; accumZ2=''; 
                      Accum_distinct_Z1=0; Accum_distinct_Z2=0; end;
 if indexc(accumZ1,Z1)=0 then Accum_distinct_Z1+1;
 if indexc(accumZ2,Z2)=0 then Accum_distinct_Z2+1;
 accumZ1 = strip(accumZ1)!!strip(Z1);
 accumZ2 = strip(accumZ2)!!strip(Z2);
run;
/* end of program */

Cheers,

Koen

mkeintz · Posted 03-27-2021 12:48 PM

This is a good use case for hash objects, which can store distinct values of the hash key in memory. In this case one hash object (hz1) stores distinct values of z1, the other (hz2) does the same for z2:

data have;
  input id Z1 $    Z2;
datalines;
1      A         1
1      B         1
1      A         2
1      D         4
2      S         3
2      A         1
3      A         1
run;

data want;
  set have;
  by id;
  if _n_=1 then do;
    declare hash hz1 ();
      hz1.definekey('Z1');
      hz1.definedone();
    declare hash hz2 ();
      hz2.definekey('Z2');
      hz2.definedone();
  end;
  if first.id then do;
    hz1.clear();
    hz2.clear();
  end;
  hz1.replace();
  hz2.replace();
  accum_distinct_z1=hz1.num_items;
  accum_distinct_z2=hz2.num_items;
run;

One attribute of a hash object is the "num_items" attribute, which tracks the number of distinct key values, so a simple assignment to accum_distinct_z1 (and ..._z2) works fine. Just remember to clear the hashes at the start of each id.

Another benefit of this approach is that the distinct values can be character values of any length, or can be numeric values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Count Accumulate number of unique values-Data step

Re: Count Accumulate number of unique values-Data step

Re: Count Accumulate number of unique values-Data step

Registration is open

Count Accumulate number of unique values-Data step

Re: Count Accumulate number of unique values-Data step

Re: Count Accumulate number of unique values-Data step

Registration is open

SAS Training: Just a Click Away