Re: Adding grouped variable without rejoining

Walternate · Posted 06-16-2015 11:08 AM

Hi,

I have two datasets. One has ID number and several categorical variables, like this:

ID cat_var1 cat_var2 cat_var3

1 a d i

2 b f h

3 c e j

4 a e h

5 c d h

Dataset 1 is unique at the ID level. Dataset 2 has an ID variable; a numeric variable, which always equals 1; and a group variable. Each ID recurs several times in Dataset 2, like this:

ID count_var group_var

1 1 1

1 1 2

2 1 1

What I want is to sum the count_var by ID and group (that is, ID=1 would have sum=2 for group 1 and sum=3 for group 2) and add the sum variables to Dataset 1. I know the inefficient way to do this--do a proc sql sum of count_var grouping by ID and group_var, then transpose, then join to Dataset 1 on ID--but I'm hoping there's a more efficient way to get what I want.

Any help is much appreciated.

Jagadishkatam · Posted 06-16-2015 11:16 AM

if you are not interested to use the proc sql , please try the datastep

dataset 2:

data dataset2;

set have;

by id group_var;

retain sum;

if first.group_var then sum=1;

else sum+1;

if last.group_var;

run;

Then this dataset will have one or more records per id, group_var with the sum information, which you could merge with the dataset1 by id variable.

Thanks,

Jag

Thanks,
Jag

ballardw · Posted 06-16-2015 11:39 AM

To get your group counts

proc summary data=dataset2 nway;

class id group_var;

var count;

output out=summarycount(drop=_type_ _freq_) sum=;

run;

Then either a join using proc sql or a data step merge by ID with the first data set

Steelers_In_DC · Posted 06-16-2015 12:04 PM

I think you were looking for a way to do all of this in one step, not so much getting away from sql, is that right? If so this will get you what you want:

data have1;

input ID cat_var1 $ cat_var2 $ cat_var3 $;

cards;

1 a d i

2 b f h

3 c e j

4 a e h

5 c d h

;

data have2;

input ID count_var group_var;

cards;

1 1 1

1 1 2

2 1 1

;

proc sql;

create table want as

select distinct

a.id,

a.cat_var1,

a.cat_var2,

a.cat_var3,

count(b.group_var) as count_group

from have1 a left join

have2 b on

a.id = b.id

group by b.id,b.group_var;

Ksharp · Posted 06-17-2015 10:18 AM

Hash Table.

Adding grouped variable without rejoining