Hello,
I am struggling with achieving something like this:
VAR1 | VAR2 | COUNT_BY_GROUP |
A | 10 | 2 |
A | 20 | 2 |
B | 243 | 1 |
C | 4 | 3 |
C | 21 | 3 |
C | 4 | 3 |
Where the COUNT_BY_GROUP variable is the total number of elements of that group.
Here is the code that I'm currently using:
data want;
set have;
by VAR1;
retain COUNT_BY_GROUP;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
run;
But it gives me a cumulative count. I want to save the total count of elements in a group.
Thank you in advance.
Hi @pkopersk Your approach is correct, however you need an extra pass of the by group to assign the computed cumulative count to each record of the by group. So, Basically I have tweaked your code to read the BY group again by using a technique called interleave aka sorted append and conditionally compute the cumulative count and assign appropriately.
data have;
input VAR1 $ VAR2;* COUNT_BY_GROUP;
cards;
A 10 2
A 20 2
B 243 1
C 4 3
C 21 3
C 4 3
;
data want;
set have(in=a) have(in=b);
by VAR1;
retain COUNT_BY_GROUP;
if a then do;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
end;
if b;
run;
The genius @mkeintz taught me this among many others he taught me personally over the last couple of years. Have fun learning!
proc freq data=have;
tables var1/noprint out=_a_;
run;
data want;
merge have _a_;
by var1;
run;
PROC FREQ is where you should start when you have a problem that involves counting a number of observations in a group. It's a very good thing to learn, and a far superior way to go than writing your own data step code each time you want to perform counting.
Hi @pkopersk
Here is another approach to the one proposed by @PaigeMiller :
proc sql;
create table want as
select VAR1, VAR2, n(VAR1) as COUNT_BY_GROUP
from have
group by VAR1;
quit;
Hi @pkopersk Your approach is correct, however you need an extra pass of the by group to assign the computed cumulative count to each record of the by group. So, Basically I have tweaked your code to read the BY group again by using a technique called interleave aka sorted append and conditionally compute the cumulative count and assign appropriately.
data have;
input VAR1 $ VAR2;* COUNT_BY_GROUP;
cards;
A 10 2
A 20 2
B 243 1
C 4 3
C 21 3
C 4 3
;
data want;
set have(in=a) have(in=b);
by VAR1;
retain COUNT_BY_GROUP;
if a then do;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
end;
if b;
run;
The genius @mkeintz taught me this among many others he taught me personally over the last couple of years. Have fun learning!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.