DATA Step, Macro, Functions and more

Summing vars based on values of another variable

Reply
Frequent Contributor
Posts: 138

Summing vars based on values of another variable

Hi,

I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):

ID     group_var     count_var1     count_var2     count_var3

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0    

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:

ID     sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26     sum_count_var3_gp_neg_50_to_neg_26

1                                 2                                                              2                                                                 8

and continuing across with the sums for the other group ranges. 

Any help is much appreciated!

Super User
Super User
Posts: 7,988

Re: Summing vars based on values of another variable

Posted in reply to Walternate

Well, I would first assign the groupings in a new variable.  Then its just a matter of aggregation of that data.  In your logic you will miss out on 0 for group I would point out.  Also, do you really need the transposed output?   Its generally better to work with normalised data, and only transpose for outputs. 

data have;

  input ID     group_var     count_var1     count_var2     count_var3;

datalines;

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0   

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

;

run;

data inter;

  set have;

  length group $20;

  if -50 <= group_var <= -26 then group="-50 to -26";

  if -25 <= group_var <= -1 then group="-25 to -1";

  if 1 <= group_var <= 25 then group="1 to 25";

  if 26 <= group_var <= 50 then group="26 to 50";

run;

proc sql;

  create table WANT as

  select  ID,

          GROUP,

          sum(COUNT_VAR1) as COUNT_1_SUM,

          sum(COUNT_VAR2) as COUNT_2_SUM,

          sum(COUNT_VAR3) as COUNT_3_SUM

  from    INTER

  group by ID,

           GROUP;

quit;

Super User
Posts: 10,044

Re: Summing vars based on values of another variable

Posted in reply to Walternate

Code: Program

data have;
  input ID   group_var   count_var1   count_var2   count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0 
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
select id,
   (select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
   (select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
   (select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
  from (select distinct id from have) as a ;
quit;
Frequent Contributor
Posts: 138

Re: Summing vars based on values of another variable

The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?

 

Thanks!

Super User
Posts: 11,343

Re: Summing vars based on values of another variable

Posted in reply to Walternate

Often formats may be helpful:

proc format;

value mygroup

-50 - -26 = "-50 to -26"

-25 - -1 ="-25 to -1"

1 - 25 ="+1 to +25"

26 - 50 = "+26 to +50"

other = "Out of Range"

;

run;

proc summary data=have nway;

     class ID group;

     format group mygroup. ;

     var var1 var2 var3;

     output out=sumset (drop = _type_ _freq_) sum=;

run;

Transpose the result if you need to.

Valued Guide
Posts: 765

Re: Summing vars based on values of another variable

Posted in reply to Walternate

Hi, another idea ...

data want (keep = id totSmiley Happy;

array tot(12);

do j=1 to 12; tot(j)=0; end;

do until(last.id);

   set have;

   by id;

   k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);

   tot(k+1) + count_var1;

   tot(k+2) + count_var2;

   tot(k+3) + count_var3;

end;

run;

ID    tot1    tot2    tot3    tot4    tot5    tot6    tot7    tot8    tot9    tot10    tot11    tot12

1      2       2       8       2       0       1       5       3       2       0        0        0

2      1       2       3       3       4       3       0       0       0       0        0        0

Ask a Question
Discussion stats
  • 5 replies
  • 272 views
  • 0 likes
  • 5 in conversation