Summing vars based on values of another variable

Walternate · Posted 08-28-2015 09:06 AM

Hi,

I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):

ID group_var count_var1 count_var2 count_var3

1 -50 0 1 3

1 -45 2 1 5

1 -1 2 0 1

1 4 4 3 2

1 12 1 0 0

2 -25 1 2 3

2 -7 0 0 1

2 -2 3 4 2

What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:

ID sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26 sum_count_var3_gp_neg_50_to_neg_26

1 2 2 8

and continuing across with the sums for the other group ranges.

Any help is much appreciated!

RW9 · Posted 08-28-2015 09:33 AM

Well, I would first assign the groupings in a new variable. Then its just a matter of aggregation of that data. In your logic you will miss out on 0 for group I would point out. Also, do you really need the transposed output? Its generally better to work with normalised data, and only transpose for outputs.

data have;

input ID group_var count_var1 count_var2 count_var3;

datalines;

1 -50 0 1 3

1 -45 2 1 5

1 -1 2 0 1

1 4 4 3 2

1 12 1 0 0

2 -25 1 2 3

2 -7 0 0 1

2 -2 3 4 2

;

run;

data inter;

set have;

length group $20;

if -50 <= group_var <= -26 then group="-50 to -26";

if -25 <= group_var <= -1 then group="-25 to -1";

if 1 <= group_var <= 25 then group="1 to 25";

if 26 <= group_var <= 50 then group="26 to 50";

run;

proc sql;

create table WANT as

select ID,

GROUP,

sum(COUNT_VAR1) as COUNT_1_SUM,

sum(COUNT_VAR2) as COUNT_2_SUM,

sum(COUNT_VAR3) as COUNT_3_SUM

from INTER

group by ID,

GROUP;

quit;

Ksharp · Posted 08-28-2015 10:34 AM

Code: Program

data have;
  input ID   group_var   count_var1   count_var2   count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0  
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
 select id,
   (select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
   (select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
   (select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
  from (select distinct id from have) as a ;
quit;

Walternate · Posted 09-16-2015 02:12 PM

The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?

Thanks!

ballardw · Posted 08-28-2015 10:54 AM

Often formats may be helpful:

proc format;

value mygroup

-50 - -26 = "-50 to -26"

-25 - -1 ="-25 to -1"

1 - 25 ="+1 to +25"

26 - 50 = "+26 to +50"

other = "Out of Range"

;

run;

proc summary data=have nway;

class ID group;

format group mygroup. ;

var var1 var2 var3;

output out=sumset (drop = _type_ _freq_) sum=;

run;

Transpose the result if you need to.

MikeZdeb · Posted 08-28-2015 11:43 AM

Hi, another idea ...

data want (keep = id tot:);

array tot(12);

do j=1 to 12; tot(j)=0; end;

do until(last.id);

set have;

by id;

k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);

tot(k+1) + count_var1;

tot(k+2) + count_var2;

tot(k+3) + count_var3;

end;

run;

ID tot1 tot2 tot3 tot4 tot5 tot6 tot7 tot8 tot9 tot10 tot11 tot12

1 2 2 8 2 0 1 5 3 2 0 0 0

2 1 2 3 3 4 3 0 0 0 0 0 0