Solved: Re: combining datasets

sophia_SAS · Posted 09-13-2012 01:26 PM

SAS community,

I would like to combine databases which includes all eligible entries from group 1 (where N ge 1 and X le 5) and group 2 (where N ge 1 and X gt 5). At the bottom is a sample dataset and a beginning code. The problem with my current code is that it won't include all entries for the same ID.

The final dataset should be resemble the dataset below:

data want;

ID X N

A 1 5

A 7 4

E 3 4

E 5 2

E 7 3

Thanks!

data have;

input ID $ X N;

datalines;

A 1 5

A 5 0

A 7 4

B 1 1

C 7 0

D 9 4

D 10 1

E 3 4

E 5 2

E 7 3

;;

run;

data group_1;

set have;

if N ge 1 and X le 5;

run;

data group_2;

set have;

if N ge 1 and X gt 5;

run;

proc sort data=group_1; by ID;

proc sort data=group_2; by ID;

quit;

data all;

merge area_1 (in=a)

area_2 (in=b);

by id;

if a and b;

run;

Reeza · Posted 09-13-2012 01:52 PM

Can you use proc sql? (not tested)

proc sql;

create table want as

select * from

(select * from group1 where id in (select distinct id from group2)

union

select * from group2 where id in (select distinct id from group1)) a;

quit;

View solution in original post

Reeza · Posted 09-13-2012 01:41 PM

Why isn't B included?

It meets the criteria for Group 1 - N Ge 1 and X LE 5

sophia_SAS · Posted 09-13-2012 01:44 PM

Because the ID has to be in both Group 1 and Group 2 in order to be in the final dataset. B only satisfies the criteria for group 1.

Reeza · Posted 09-13-2012 01:52 PM

Can you use proc sql? (not tested)

proc sql;

create table want as

select * from

(select * from group1 where id in (select distinct id from group2)

union

select * from group2 where id in (select distinct id from group1)) a;

quit;

sophia_SAS · Posted 09-13-2012 01:59 PM

I think that worked! Thanks

Astounding · Posted 09-13-2012 02:00 PM

Here's one way:

proc sort data=have;

by id;

run;

data group_1 group_2;

set have;

if N ge 1;

if X le 5 then output group_1;

else output group_2;

keep id;

run;

data want;

merge have group_1 (in=in1) group_2 (in=in2);

by id;

if in1 and in2;

run;

You may get messages related to a many-to-many merge, but they are harmless in this particular case. (Most any other time, you would definitely want to investigate if you got this type of message.) You could eliminate those messages by re-sorting group_1 and group_2 before merging:

proc sort data=group_1 NODUPKEY;

by id;

run;

But for this particular case it won't hurt to skip this because HAVE will have at least as many observations for each ID as the other two data sets have.

Good luck.

Ksharp · Posted 09-13-2012 10:03 PM

It is so weird.

data have;
input ID $ X N;
datalines;
A 1 5
A 5 0
A 7 4
B 1 1
C 7 0
D 9 4
D 10 1
E 3 4
E 5 2
E 7 3
;
run;
proc sql;
create table want as
 select *
  from have
   where n ge 1
    group by id
     having sum(x le 5) gt 0 and sum(X gt 5) gt 0;
quit;

Ksharp

Registration is open

SAS Training: Just a Click Away