Hi,
I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):
ID group_var count_var1 count_var2 count_var3
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:
ID sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26 sum_count_var3_gp_neg_50_to_neg_26
1 2 2 8
and continuing across with the sums for the other group ranges.
Any help is much appreciated!
Well, I would first assign the groupings in a new variable. Then its just a matter of aggregation of that data. In your logic you will miss out on 0 for group I would point out. Also, do you really need the transposed output? Its generally better to work with normalised data, and only transpose for outputs.
data have;
input ID group_var count_var1 count_var2 count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
data inter;
set have;
length group $20;
if -50 <= group_var <= -26 then group="-50 to -26";
if -25 <= group_var <= -1 then group="-25 to -1";
if 1 <= group_var <= 25 then group="1 to 25";
if 26 <= group_var <= 50 then group="26 to 50";
run;
proc sql;
create table WANT as
select ID,
GROUP,
sum(COUNT_VAR1) as COUNT_1_SUM,
sum(COUNT_VAR2) as COUNT_2_SUM,
sum(COUNT_VAR3) as COUNT_3_SUM
from INTER
group by ID,
GROUP;
quit;
data have;
input ID group_var count_var1 count_var2 count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
select id,
(select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
(select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
(select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
from (select distinct id from have) as a ;
quit;
The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?
Thanks!
Often formats may be helpful:
proc format;
value mygroup
-50 - -26 = "-50 to -26"
-25 - -1 ="-25 to -1"
1 - 25 ="+1 to +25"
26 - 50 = "+26 to +50"
other = "Out of Range"
;
run;
proc summary data=have nway;
class ID group;
format group mygroup. ;
var var1 var2 var3;
output out=sumset (drop = _type_ _freq_) sum=;
run;
Transpose the result if you need to.
Hi, another idea ...
data want (keep = id tot:);
array tot(12);
do j=1 to 12; tot(j)=0; end;
do until(last.id);
set have;
by id;
k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);
tot(k+1) + count_var1;
tot(k+2) + count_var2;
tot(k+3) + count_var3;
end;
run;
ID tot1 tot2 tot3 tot4 tot5 tot6 tot7 tot8 tot9 tot10 tot11 tot12
1 2 2 8 2 0 1 5 3 2 0 0 0
2 1 2 3 3 4 3 0 0 0 0 0 0
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.