Hello,
I am struggling with achieving something like this:
| VAR1 | VAR2 | COUNT_BY_GROUP |
| A | 10 | 2 |
| A | 20 | 2 |
| B | 243 | 1 |
| C | 4 | 3 |
| C | 21 | 3 |
| C | 4 | 3 |
Where the COUNT_BY_GROUP variable is the total number of elements of that group.
Here is the code that I'm currently using:
data want;
set have;
by VAR1;
retain COUNT_BY_GROUP;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
run;But it gives me a cumulative count. I want to save the total count of elements in a group.
Thank you in advance.
Hi @pkopersk Your approach is correct, however you need an extra pass of the by group to assign the computed cumulative count to each record of the by group. So, Basically I have tweaked your code to read the BY group again by using a technique called interleave aka sorted append and conditionally compute the cumulative count and assign appropriately.
data have;
input VAR1 $ VAR2;* COUNT_BY_GROUP;
cards;
A 10 2
A 20 2
B 243 1
C 4 3
C 21 3
C 4 3
;
data want;
set have(in=a) have(in=b);
by VAR1;
retain COUNT_BY_GROUP;
if a then do;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
end;
if b;
run;
The genius @mkeintz taught me this among many others he taught me personally over the last couple of years. Have fun learning!
proc freq data=have;
tables var1/noprint out=_a_;
run;
data want;
merge have _a_;
by var1;
run;
PROC FREQ is where you should start when you have a problem that involves counting a number of observations in a group. It's a very good thing to learn, and a far superior way to go than writing your own data step code each time you want to perform counting.
Hi @pkopersk
Here is another approach to the one proposed by @PaigeMiller :
proc sql;
create table want as
select VAR1, VAR2, n(VAR1) as COUNT_BY_GROUP
from have
group by VAR1;
quit;
Hi @pkopersk Your approach is correct, however you need an extra pass of the by group to assign the computed cumulative count to each record of the by group. So, Basically I have tweaked your code to read the BY group again by using a technique called interleave aka sorted append and conditionally compute the cumulative count and assign appropriately.
data have;
input VAR1 $ VAR2;* COUNT_BY_GROUP;
cards;
A 10 2
A 20 2
B 243 1
C 4 3
C 21 3
C 4 3
;
data want;
set have(in=a) have(in=b);
by VAR1;
retain COUNT_BY_GROUP;
if a then do;
if first.VAR1 then COUNT_BY_GROUP = 0;
COUNT_BY_GROUP +1;
end;
if b;
run;
The genius @mkeintz taught me this among many others he taught me personally over the last couple of years. Have fun learning!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.