Selecting rows satisfying condition linked to another dataset

Accepted Solution Solved
Reply
Contributor
Posts: 35
Accepted Solution

Selecting rows satisfying condition linked to another dataset

Dear All,

I have the following 2 datasets:

data db1;

     input id V1 V2;

datalines;

101     2     2.1

101     2     2.1

102     0     2.9

103     2     3.1

104     4     1.7

105     1     4.5

106     0     0.5;

data db2;

     input id;

datalines;

101    

103 

104    

106;

I would like to obtain a dataset want including all rows (and all columns) of the database db1 such that db1.id is different from db2.id. That is, I would like to obtain the following result:

data want;

     input id V1 V2;

datalines;

102     0     2.9

105     1     4.5;


My attempt involves PROC SQL:


proc sql;

     create table want as

     select db1.*

     from db1, db2

     where db1.id ^= db2.id;

quit;

However, my dataset db1 has >20 million rows and I think this is not the most efficient solution (it's super slow). Any help would be highly appreciated.


Accepted Solutions
Solution
‎09-17-2014 11:56 AM
Trusted Advisor
Posts: 1,203

Re: Selecting rows satisfying condition linked to another dataset

proc sql;

     create table want as

     select db1.*

     from db1

     where id not in (select id from db2);

quit;

View solution in original post


All Replies
Solution
‎09-17-2014 11:56 AM
Trusted Advisor
Posts: 1,203

Re: Selecting rows satisfying condition linked to another dataset

proc sql;

     create table want as

     select db1.*

     from db1

     where id not in (select id from db2);

quit;

Respected Advisor
Posts: 3,124

Re: Selecting rows satisfying condition linked to another dataset

If both of your tables are sorted by 'id', then MERGE will work. Otherwise, Hash will be handy, load db2 with one variable 'id' into Hash.

Regards,

Haikuo

Super User
Super User
Posts: 6,318

Re: Selecting rows satisfying condition linked to another dataset

If you keep both datasets sorted by ID then a simple merge would be very efficient.

data want ;

  merge db1 db2 (in=in2);

by id;

if not in2 ;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 193 views
  • 6 likes
  • 4 in conversation