Hi,
I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):
ID group_var count_var1 count_var2 count_var3
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:
ID sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26 sum_count_var3_gp_neg_50_to_neg_26
1 2 2 8
and continuing across with the sums for the other group ranges.
Any help is much appreciated!
Well, I would first assign the groupings in a new variable. Then its just a matter of aggregation of that data. In your logic you will miss out on 0 for group I would point out. Also, do you really need the transposed output? Its generally better to work with normalised data, and only transpose for outputs.
data have;
input ID group_var count_var1 count_var2 count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
data inter;
set have;
length group $20;
if -50 <= group_var <= -26 then group="-50 to -26";
if -25 <= group_var <= -1 then group="-25 to -1";
if 1 <= group_var <= 25 then group="1 to 25";
if 26 <= group_var <= 50 then group="26 to 50";
run;
proc sql;
create table WANT as
select ID,
GROUP,
sum(COUNT_VAR1) as COUNT_1_SUM,
sum(COUNT_VAR2) as COUNT_2_SUM,
sum(COUNT_VAR3) as COUNT_3_SUM
from INTER
group by ID,
GROUP;
quit;
data have;
input ID group_var count_var1 count_var2 count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
select id,
(select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
(select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
(select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
from (select distinct id from have) as a ;
quit;
The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?
Thanks!
Often formats may be helpful:
proc format;
value mygroup
-50 - -26 = "-50 to -26"
-25 - -1 ="-25 to -1"
1 - 25 ="+1 to +25"
26 - 50 = "+26 to +50"
other = "Out of Range"
;
run;
proc summary data=have nway;
class ID group;
format group mygroup. ;
var var1 var2 var3;
output out=sumset (drop = _type_ _freq_) sum=;
run;
Transpose the result if you need to.
Hi, another idea ...
data want (keep = id tot:);
array tot(12);
do j=1 to 12; tot(j)=0; end;
do until(last.id);
set have;
by id;
k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);
tot(k+1) + count_var1;
tot(k+2) + count_var2;
tot(k+3) + count_var3;
end;
run;
ID tot1 tot2 tot3 tot4 tot5 tot6 tot7 tot8 tot9 tot10 tot11 tot12
1 2 2 8 2 0 1 5 3 2 0 0 0
2 1 2 3 3 4 3 0 0 0 0 0 0
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.