SAS community,
I would like to combine databases which includes all eligible entries from group 1 (where N ge 1 and X le 5) and group 2 (where N ge 1 and X gt 5). At the bottom is a sample dataset and a beginning code. The problem with my current code is that it won't include all entries for the same ID.
The final dataset should be resemble the dataset below:
data want;
ID X N
A 1 5
A 7 4
E 3 4
E 5 2
E 7 3
Thanks!
data have;
input ID $ X N;
datalines;
A 1 5
A 5 0
A 7 4
B 1 1
C 7 0
D 9 4
D 10 1
E 3 4
E 5 2
E 7 3
;;
run;
data group_1;
set have;
if N ge 1 and X le 5;
run;
data group_2;
set have;
if N ge 1 and X gt 5;
run;
proc sort data=group_1; by ID;
proc sort data=group_2; by ID;
quit;
data all;
merge area_1 (in=a)
area_2 (in=b);
by id;
if a and b;
run;
Can you use proc sql? (not tested)
proc sql;
create table want as
select * from
(select * from group1 where id in (select distinct id from group2)
union
select * from group2 where id in (select distinct id from group1)) a;
quit;
Why isn't B included?
It meets the criteria for Group 1 - N Ge 1 and X LE 5
Because the ID has to be in both Group 1 and Group 2 in order to be in the final dataset. B only satisfies the criteria for group 1.
Can you use proc sql? (not tested)
proc sql;
create table want as
select * from
(select * from group1 where id in (select distinct id from group2)
union
select * from group2 where id in (select distinct id from group1)) a;
quit;
I think that worked! Thanks
Here's one way:
proc sort data=have;
by id;
run;
data group_1 group_2;
set have;
if N ge 1;
if X le 5 then output group_1;
else output group_2;
keep id;
run;
data want;
merge have group_1 (in=in1) group_2 (in=in2);
by id;
if in1 and in2;
run;
You may get messages related to a many-to-many merge, but they are harmless in this particular case. (Most any other time, you would definitely want to investigate if you got this type of message.) You could eliminate those messages by re-sorting group_1 and group_2 before merging:
proc sort data=group_1 NODUPKEY;
by id;
run;
But for this particular case it won't hurt to skip this because HAVE will have at least as many observations for each ID as the other two data sets have.
Good luck.
It is so weird.
data have; input ID $ X N; datalines; A 1 5 A 5 0 A 7 4 B 1 1 C 7 0 D 9 4 D 10 1 E 3 4 E 5 2 E 7 3 ; run; proc sql; create table want as select * from have where n ge 1 group by id having sum(x le 5) gt 0 and sum(X gt 5) gt 0; quit;
Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.