BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):

ID     group_var     count_var1     count_var2     count_var3

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0    

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:

ID     sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26     sum_count_var3_gp_neg_50_to_neg_26

1                                 2                                                              2                                                                 8

and continuing across with the sums for the other group ranges. 

Any help is much appreciated!

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I would first assign the groupings in a new variable.  Then its just a matter of aggregation of that data.  In your logic you will miss out on 0 for group I would point out.  Also, do you really need the transposed output?   Its generally better to work with normalised data, and only transpose for outputs. 

data have;

  input ID     group_var     count_var1     count_var2     count_var3;

datalines;

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0   

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

;

run;

data inter;

  set have;

  length group $20;

  if -50 <= group_var <= -26 then group="-50 to -26";

  if -25 <= group_var <= -1 then group="-25 to -1";

  if 1 <= group_var <= 25 then group="1 to 25";

  if 26 <= group_var <= 50 then group="26 to 50";

run;

proc sql;

  create table WANT as

  select  ID,

          GROUP,

          sum(COUNT_VAR1) as COUNT_1_SUM,

          sum(COUNT_VAR2) as COUNT_2_SUM,

          sum(COUNT_VAR3) as COUNT_3_SUM

  from    INTER

  group by ID,

           GROUP;

quit;

Ksharp
Super User

Code: Program

data have;
  input ID   group_var   count_var1   count_var2   count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0 
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
select id,
   (select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
   (select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
   (select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
  from (select distinct id from have) as a ;
quit;
Walternate
Obsidian | Level 7

The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?

 

Thanks!

ballardw
Super User

Often formats may be helpful:

proc format;

value mygroup

-50 - -26 = "-50 to -26"

-25 - -1 ="-25 to -1"

1 - 25 ="+1 to +25"

26 - 50 = "+26 to +50"

other = "Out of Range"

;

run;

proc summary data=have nway;

     class ID group;

     format group mygroup. ;

     var var1 var2 var3;

     output out=sumset (drop = _type_ _freq_) sum=;

run;

Transpose the result if you need to.

MikeZdeb
Rhodochrosite | Level 12

Hi, another idea ...

data want (keep = id tot:);

array tot(12);

do j=1 to 12; tot(j)=0; end;

do until(last.id);

   set have;

   by id;

   k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);

   tot(k+1) + count_var1;

   tot(k+2) + count_var2;

   tot(k+3) + count_var3;

end;

run;

ID    tot1    tot2    tot3    tot4    tot5    tot6    tot7    tot8    tot9    tot10    tot11    tot12

1      2       2       8       2       0       1       5       3       2       0        0        0

2      1       2       3       3       4       3       0       0       0       0        0        0

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1363 views
  • 0 likes
  • 5 in conversation