BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

Dear community,

              I have 2 huge datasets with millions of records. I want to simulate that in this small example. 

I have dataset AA & BB. I want to get unique records (no duplicates) when I merge these two datasets with ID field. and I also want get the records whose age is greater than or equal to 60 years (even with duplicates whose age is greater than or equal to 60). If there are any other lower age with 60 years then I don't want to see them (like BB dataset has ID 'D' which has age of 60 & 59). 

Data AA;
Input ID Name$ Height age;
cards;
1 A 1 30
3 B 2 40
5 C 2 50
7 D 2 60
9 E 2 70
;
run;

Data BB;
Input ID Name1$ height1 age1;
cards;
1 A 1 30
3 B 4 40
3 B 2 40
5 C 2 50
5 C 2 50
5 C 2 50
7 D 2 60
7 D 2 60
7 D 2 59
9 E 2 70
9 E 2 70
;
run;

PROC SQL;
	CREATE TABLE LOOK AS 
	SELECT A.*, B.* 
	FROM AA AS A INNER JOIN BB AS B ON A.ID=B.ID 
	;
QUIT;

DATA ONE TWO;
	SET LOOK ;
	BY ID; 
	IF AGE >= 60 and age1 >=60 THEN OUTPUT ONE;
	ELSE IF FIRST.ID AND LAST.ID THEN OUTPUT ONE;
	ELSE OUTPUT TWO;
RUN;

The desired output should look like below. 


ID Name Height age Name1 Height1 age1
1  A        1          30    A          1           30
9  E        2         70     E          2          70
9  E        2         70     E          2          70

Any help is much appreciated. Thanks in advance. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This does it:

Data AA;
Input ID Name$ Height age;
cards;
1 A 1 30
3 B 2 40
5 C 2 50
7 D 2 60
9 E 2 70
;

Data BB;
Input ID Name1$ height1 age1;
cards;
1 A 1 30
3 B 4 40
3 B 2 40
5 C 2 50
5 C 2 50
5 C 2 50
7 D 2 60
7 D 2 60
7 D 2 59
9 E 2 70
9 E 2 70
;

data want;
merge
  aa
  bb
;
by id;
if _n_ = 1
then do;
  declare hash b (dataset:"bb (where=(age1 le 60))");
  b.definekey("id");
  b.definedone();
end;
if (first.id and last.id) or b.check() ne 0;
run;

proc print data=want noobs;
run;

Result:

D	Name	Height	age	Name1	height1	age1
1	A	1	30	A	1	30
9	E	2	70	E	2	70
9	E	2	70	E	2	70

View solution in original post

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

I don't follow this logic. Why should ID = 1 be in the result but ID = 2 should not?

buddha_d
Pyrite | Level 9
This is just made up data and I picked odd numbers as IDs. There is no logic why ID=2 is not there. But, the point here is I want unique numbers (when merged both datasets) as well as any records which has some duplicates with age is equal or greater than 60 . At the same time, if there are any ages below 60 along with age of 60 then I don't want to see it.
Please let me know if you need more clarification
PeterClemmensen
Tourmaline | Level 20

Please post the exact result that you expect from the posted sample data. Makes it much easier to provide a usable code answer.

buddha_d
Pyrite | Level 9

ID Name Height age Name1 Height1 age1
1  A        1          30    A          1           30
9  E        2         70     E          2          70
9  E        2         70     E          2          70

PeterClemmensen
Tourmaline | Level 20

Again, since this is your desired output, you must be able to explain why ID = 2 is not present and ID = 1 is.

buddha_d
Pyrite | Level 9
The above dataset is the desired result.
Shmuel
Garnet | Level 18

You asked for: "unique records (no duplicates) when I merge these two datasets with ID field. and I also want get the records whose age is greater than or equal to 60 years ".

1) Why have you added next lines to desired output?

ID Name Height age Name1 Height1 age1
1  A        1          30    A          1           30     -   Age is less than 60 ?
9  E        2         70     E          2          70
9  E        2         70     E          2          70      - This is a duplicate ID ?!

You will get what you asked by next code:

 

PROC SQL;
	CREATE TABLE look AS 
	SELECT distinct A.*, B.* 
	FROM AA AS A INNER JOIN BB AS B
	ON A.ID=B.ID and min(A.age,B.age1) >= 60
	order by A.ID
	;
quit;

Result:

ID Name 

ID            Name       Height         Age           Name1      Height1     Age1

 

 

 

buddha_d
Pyrite | Level 9

Shmuel,

 I have a condition where I want to have unique records (which can have no age limit restrictions) and second condition is if the age is >= 60 then I have a condition where I don't want to see any age that is below 60. since ID "D" has one record (age 59 below 60), I don't want to see that output at all. 
Thanks !!

Shmuel
Garnet | Level 18

@buddha_d wrote:

Shmuel,

 I have a condition where I want to have unique records (which can have no age limit restrictions) and second condition is if the age is >= 60 then I have a condition where I don't want to see any age that is below 60. since ID "D" has one record (age 59 below 60), I don't want to see that output at all. 
Thanks !!


Next code fits your conditions:

/* first codition - age>= 60 and age2 >= 60 */
PROC SQL;
	CREATE TABLE tmp1 AS 
	SELECT A.*, B.*
	FROM AA AS A INNER JOIN BB AS B
	ON A.ID=B.ID and A.age>= 60 and B.age1 >= 60
	order by A.ID
	;
	create table tmp1a as 
	select distinct ID	 
	from BB(where=(age1 < 60))
	group by ID;
quit;
/* data tmp1; set tmp1; if age1 < 60 then delete; run; */

/* second condition - uniqe ID */
proc sort data=AA; by ID; run;
proc sort data=BB; by ID; run;
data tmp2;
 merge AA(in=inA) bb(in=inB);
  by ID;
     if first.ID and last.ID;    
run;
data look;
 merge tmp1(in=in1) tmp1a(in=in1a) tmp2(in=in2) ;
   by ID;
      if (in1 and not in1a) or in2;
run;
proc sort data=look; by ID; run;
proc print data=look; run;
Kurt_Bremser
Super User

This does it:

Data AA;
Input ID Name$ Height age;
cards;
1 A 1 30
3 B 2 40
5 C 2 50
7 D 2 60
9 E 2 70
;

Data BB;
Input ID Name1$ height1 age1;
cards;
1 A 1 30
3 B 4 40
3 B 2 40
5 C 2 50
5 C 2 50
5 C 2 50
7 D 2 60
7 D 2 60
7 D 2 59
9 E 2 70
9 E 2 70
;

data want;
merge
  aa
  bb
;
by id;
if _n_ = 1
then do;
  declare hash b (dataset:"bb (where=(age1 le 60))");
  b.definekey("id");
  b.definedone();
end;
if (first.id and last.id) or b.check() ne 0;
run;

proc print data=want noobs;
run;

Result:

D	Name	Height	age	Name1	height1	age1
1	A	1	30	A	1	30
9	E	2	70	E	2	70
9	E	2	70	E	2	70
buddha_d
Pyrite | Level 9

Thanks Kurt

s_lassen
Meteorite | Level 14

It can be done in a single data step, like this:

data want;
  merge aa(in=in1) bb(in=in2) bb(where=(age1<60) in=young);
  by ID;
  if young then
    if first.id and last.id;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2095 views
  • 2 likes
  • 5 in conversation