Hello
What is the data step way to count Accumulate number of unique values?
For example:
Id Z1 Z2
1 A 1
1 B 1
1 A 2
1 D 4
2 S 3
2 A 1
3 A 1
Wanted result is :
Id Z1 Z2 Accum_distinct_Z1 Accum_distinct_Z2
1 A 1 1 1
1 B 1 2 1
1 A 2 2 2
1 D 4 3 3
2 S 3 1 1
2 A 1 2 2
3 A 1 1 1
Hello,
This is one way to do it:
data abc;
Length Id $ 1 Z1 $ 1 Z2 $ 1;
input Id $ Z1 $ Z2 $;
cards;
1 A 1
1 B 1
1 A 2
1 D 4
2 S 3
2 A 1
3 A 1
;
run;
data def(drop=accumZ:);
Length accumZ1 accumZ2 $ 100;
set abc;
retain accumZ1 '' accumZ2 '';
by Id;
if first.Id then do; accumZ1='' ; accumZ2='';
Accum_distinct_Z1=0; Accum_distinct_Z2=0; end;
if indexc(accumZ1,Z1)=0 then Accum_distinct_Z1+1;
if indexc(accumZ2,Z2)=0 then Accum_distinct_Z2+1;
accumZ1 = strip(accumZ1)!!strip(Z1);
accumZ2 = strip(accumZ2)!!strip(Z2);
run;
/* end of program */
Cheers,
Koen
This is a good use case for hash objects, which can store distinct values of the hash key in memory. In this case one hash object (hz1) stores distinct values of z1, the other (hz2) does the same for z2:
data have;
input id Z1 $ Z2;
datalines;
1 A 1
1 B 1
1 A 2
1 D 4
2 S 3
2 A 1
3 A 1
run;
data want;
set have;
by id;
if _n_=1 then do;
declare hash hz1 ();
hz1.definekey('Z1');
hz1.definedone();
declare hash hz2 ();
hz2.definekey('Z2');
hz2.definedone();
end;
if first.id then do;
hz1.clear();
hz2.clear();
end;
hz1.replace();
hz2.replace();
accum_distinct_z1=hz1.num_items;
accum_distinct_z2=hz2.num_items;
run;
One attribute of a hash object is the "num_items" attribute, which tracks the number of distinct key values, so a simple assignment to accum_distinct_z1 (and ..._z2) works fine. Just remember to clear the hashes at the start of each id.
Another benefit of this approach is that the distinct values can be character values of any length, or can be numeric values.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.