BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

I have a dataset with 5 variables--ID, a group variable that can be anywhere from -50 to +50, and three count variables. The dataset is at the person/group level (so each ID could occur 100 times if a person has a count for every group):

ID     group_var     count_var1     count_var2     count_var3

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0    

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

What I want is to obtain summative counts using values of group. So for each ID, I would want to know the sum of count_var1, count_var2, and count_var3, for group_var=-50 to -26, -25 to -1, +1 to +25, and +26 to +50. The output would ideally look like this:

ID     sum_count_var1_gp_neg_50_to_neg_26 sum_count_var2_gp_neg_50_to_neg_26     sum_count_var3_gp_neg_50_to_neg_26

1                                 2                                                              2                                                                 8

and continuing across with the sums for the other group ranges. 

Any help is much appreciated!

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I would first assign the groupings in a new variable.  Then its just a matter of aggregation of that data.  In your logic you will miss out on 0 for group I would point out.  Also, do you really need the transposed output?   Its generally better to work with normalised data, and only transpose for outputs. 

data have;

  input ID     group_var     count_var1     count_var2     count_var3;

datalines;

1            -50                0                     1                    3

1            -45                2                     1                    5

1            -1                  2                     0                    1

1             4                  4                     3                    2

1            12                 1                     0                    0   

2           -25                 1                     2                    3

2            -7                  0                     0                    1

2            -2                  3                     4                    2

;

run;

data inter;

  set have;

  length group $20;

  if -50 <= group_var <= -26 then group="-50 to -26";

  if -25 <= group_var <= -1 then group="-25 to -1";

  if 1 <= group_var <= 25 then group="1 to 25";

  if 26 <= group_var <= 50 then group="26 to 50";

run;

proc sql;

  create table WANT as

  select  ID,

          GROUP,

          sum(COUNT_VAR1) as COUNT_1_SUM,

          sum(COUNT_VAR2) as COUNT_2_SUM,

          sum(COUNT_VAR3) as COUNT_3_SUM

  from    INTER

  group by ID,

           GROUP;

quit;

Ksharp
Super User

Code: Program

data have;
  input ID   group_var   count_var1   count_var2   count_var3;
datalines;
1 -50 0 1 3
1 -45 2 1 5
1 -1 2 0 1
1 4 4 3 2
1 12 1 0 0 
2 -25 1 2 3
2 -7 0 0 1
2 -2 3 4 2
;
run;
proc sql;
create table want as
select id,
   (select sum(count_var1) from have where id=a.id and group_var between -50 and -26) as sum_count_var1_neg_50_to_neg_26,
   (select sum(count_var2) from have where id=a.id and group_var between -50 and -26) as sum_count_var2_neg_50_to_neg_26,
   (select sum(count_var3) from have where id=a.id and group_var between -50 and -26) as sum_count_var3_neg_50_to_neg_26
  from (select distinct id from have) as a ;
quit;
Walternate
Obsidian | Level 7

The proc sql solution yielded the correct output, but would you mind explaining to me exactly what the code is doing so I can understand how it works?

 

Thanks!

ballardw
Super User

Often formats may be helpful:

proc format;

value mygroup

-50 - -26 = "-50 to -26"

-25 - -1 ="-25 to -1"

1 - 25 ="+1 to +25"

26 - 50 = "+26 to +50"

other = "Out of Range"

;

run;

proc summary data=have nway;

     class ID group;

     format group mygroup. ;

     var var1 var2 var3;

     output out=sumset (drop = _type_ _freq_) sum=;

run;

Transpose the result if you need to.

MikeZdeb
Rhodochrosite | Level 12

Hi, another idea ...

data want (keep = id tot:);

array tot(12);

do j=1 to 12; tot(j)=0; end;

do until(last.id);

   set have;

   by id;

   k = 3 * (ceil(group_var/25) + (group_var eq -50) + 1);

   tot(k+1) + count_var1;

   tot(k+2) + count_var2;

   tot(k+3) + count_var3;

end;

run;

ID    tot1    tot2    tot3    tot4    tot5    tot6    tot7    tot8    tot9    tot10    tot11    tot12

1      2       2       8       2       0       1       5       3       2       0        0        0

2      1       2       3       3       4       3       0       0       0       0        0        0

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1026 views
  • 0 likes
  • 5 in conversation