DATA Step, Macro, Functions and more

Adding grouped variable without rejoining

Reply
Frequent Contributor
Posts: 138

Adding grouped variable without rejoining

Hi,

I have two datasets. One has ID number and several categorical variables, like this:

ID     cat_var1    cat_var2     cat_var3

1            a                d                 i

2            b                f                 h

3            c                e                 j

4            a                e                h

5            c                d                h

Dataset 1 is unique at the ID level. Dataset 2 has an ID variable; a numeric variable, which always equals 1; and a group variable. Each ID recurs several times in Dataset 2, like this:

ID     count_var     group_var

1          1                        1

1          1                        1

1          1                        2    

1          1                        2

1          1                        2

2          1                        1

2          1                        1

2          1                        1

What I want is to sum the count_var by ID and group (that is, ID=1 would have sum=2 for group 1 and sum=3 for group 2) and add the sum variables to Dataset 1. I know the inefficient way to do this--do a proc sql sum of count_var grouping by ID and group_var, then transpose, then join to Dataset 1 on ID--but I'm hoping there's a more efficient way to get what I want.

Any help is much appreciated.

Trusted Advisor
Posts: 1,137

Re: Adding grouped variable without rejoining

Posted in reply to Walternate

if you are not interested to use the proc sql , please try the datastep

dataset 2:

data dataset2;

set have;

by id group_var;

retain sum;

if first.group_var then sum=1;

else sum+1;

if last.group_var;

run;

Then this dataset will have one or more records per id, group_var with the sum information, which you could merge with the dataset1 by id variable.

Thanks,

Jag

Thanks,
Jag
Super User
Posts: 11,343

Re: Adding grouped variable without rejoining

Posted in reply to Walternate

To get your group counts

proc summary data=dataset2 nway;

class id group_var;

var count;

output out=summarycount(drop=_type_ _freq_) sum=;

run;

Then either  a join using proc sql or a data step merge by ID with the first data set

Valued Guide
Posts: 860

Re: Adding grouped variable without rejoining

Posted in reply to Walternate

I think you were looking for a way to do all of this in one step, not so much getting away from sql, is that right?  If so this will get you what you want:

data have1;

  input ID cat_var1 $ cat_var2 $ cat_var3 $;

cards;

1 a d i

2 b f h

3 c e j

4 a e h

5 c d h

;

data have2;

  input ID     count_var     group_var;

cards;

1 1 1

1 1 1

1 1 2

1 1 2

1 1 2

2 1 1

2 1 1

2 1 1

;

proc sql;

create table want as

select distinct

a.id,

a.cat_var1,

a.cat_var2,

a.cat_var3,

count(b.group_var) as count_group

from have1 a left join

     have2 b on

a.id = b.id

group by b.id,b.group_var;

Super User
Posts: 10,041

Re: Adding grouped variable without rejoining

Posted in reply to Walternate

Hash Table.

Ask a Question
Discussion stats
  • 4 replies
  • 264 views
  • 0 likes
  • 5 in conversation