Hi,
How can i add count rows by groups through proc sql AND THEN (this is the key thing I can't find instructions on) create a new "total" variable that is the same for each member of the group -- NOT just the Last member of the group. In conceptual terms, I want each group member to know how many members of its group have x=1.
Have dataset
x y n
0 a 0
1 a 1
0 b 0
1 b 1
1 b 2
0 c 0
1 c 1
1 c 2
1 c 3
wanted output
x y n n2
0 a 0 1
1 a 1 1
0 b 0 2
1 b 1 2
1 b 2 2
0 c 0 3
1 c 1 3
1 c 2 3
1 c 3 3
Grouping variable is y, the variable to be counted/totaled is x.
i know how to do it with data step:
Proc sort data = have ;
by y;
run;
data want;
set have;
by y;
if first.y then n =0;
n + x;
run;
Since you are bothering to sort the data set, here is a single data step solution to use that sorted data set (changed "n+x" to "n2+x" below):
data want;
set have;
by y;
if first.y then n2 =0;
n2 + x;
if last.y;
do until (last.y);
set have;
by y;
output;
end;
run;
There are two SET plus BY pairs. So the first. and last. dummy variables appear to be paired with the most-recently-used SET/BY combination.
@morichard wrote:
Hi,
How can i add count rows by groups through proc sql AND THEN (this is the key thing I can't find instructions on) create a new "total" variable that is the same for each member of the group -- NOT just the Last member of the group. In conceptual terms, I want each group member to know how many members of its group have x=1.
Have dataset
x y n
0 a 0
1 a 1
0 b 0
1 b 1
1 b 2
0 c 0
1 c 1
1 c 2
1 c 3
wanted output
x y n n2
0 a 0 1
1 a 1 1
0 b 0 2
1 b 1 2
1 b 2 2
0 c 0 3
1 c 1 3
1 c 2 3
1 c 3 3
Grouping variable is y, the variable to be counted/totaled is x.
i know how to do it with data step:
Proc sort data = have ;
by y;
run;
data want;
set have;
by y;
if first.y then n =0;
n + x;
run;
For future reference please specify which variable(s) define a group. It really isn't nice to make us guess.
A possible solution if the values of X are only ever 0, 1 and possibly missing:
data have; input x y $ n; datalines; 0 a 0 1 a 1 0 b 0 1 b 1 1 b 2 0 c 0 1 c 1 1 c 2 1 c 3 ; proc sql; create table want as select x,y,n, sum(x) as n2 from have group by y ; quit;
There will be a note in the log:
NOTE: The query requires remerging summary statistics back with the original data.
which is because it is doing exactly what you request: merging a single summary statistic (the sum of X) to all of the rows of the group.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.