- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 03-11-2011 04:34 PM
(5851 views)
Hi, all
Suppose I have two data sets Main and Minor each of which has a Variable called Key.
Main Key
1
2
3
4
5
Minor Key
1
3
5
7
9
Inner Join on Key will give:
1
3
5
However what i want are those Main Key that are NOT included in Minor Key, i.e., the result I am looking for is:
2
4
What kind of join can do this? or How to write Proc SQL codes to do this? Thanks.
Suppose I have two data sets Main and Minor each of which has a Variable called Key.
Main Key
1
2
3
4
5
Minor Key
1
3
5
7
9
Inner Join on Key will give:
1
3
5
However what i want are those Main Key that are NOT included in Minor Key, i.e., the result I am looking for is:
2
4
What kind of join can do this? or How to write Proc SQL codes to do this? Thanks.
9 REPLIES 9
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is how I would do it, but there may be a better way:
data set1;
x = 1 ; output;
x = 2 ; output;
x = 3 ; output;
x = 4 ; output;
x = 5 ; output;
run;
data set2;
x = 1 ; output;
x = 3 ; output;
x = 5 ; output;
x = 7 ; output;
x = 9 ; output;
run;
proc sql;
create table NotInner as
select set1.x
from set1
left join
set2
on set1.x = set2.x
where set2.x is null;
quit;
data set1;
x = 1 ; output;
x = 2 ; output;
x = 3 ; output;
x = 4 ; output;
x = 5 ; output;
run;
data set2;
x = 1 ; output;
x = 3 ; output;
x = 5 ; output;
x = 7 ; output;
x = 9 ; output;
run;
proc sql;
create table NotInner as
select set1.x
from set1
left join
set2
on set1.x = set2.x
where set2.x is null;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you for help. I guess join may not be the best choice for such operation. Maybe EXCEPT is better?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There's also a not exists.
select t1.*
from main t1
where not exists (select * from minor where key=t1.key);
quit;
select t1.*
from main t1
where not exists (select * from minor where key=t1.key);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
data set1;
x = 1 ; output;
x = 2 ; output;
x = 3 ; output;
x = 4 ; output;
x = 5 ; output;
run;
data set2;
x = 1 ; output;
x = 3 ; output;
x = 5 ; output;
x = 7 ; output;
x = 9 ; output;
run;
proc sql;
/* this way is suited only for all variables has the same value*/
create table t1 as
select *
from set1
except
select *
from set2;
/*another way*/
create table t2 as
select *
from set1
where x not in (select x from set2);
quit;
Ksharp Message was edited by: Ksharp
data set1;
x = 1 ; output;
x = 2 ; output;
x = 3 ; output;
x = 4 ; output;
x = 5 ; output;
run;
data set2;
x = 1 ; output;
x = 3 ; output;
x = 5 ; output;
x = 7 ; output;
x = 9 ; output;
run;
proc sql;
/* this way is suited only for all variables has the same value*/
create table t1 as
select *
from set1
except
select *
from set2;
/*another way*/
create table t2 as
select *
from set1
where x not in (select x from set2);
quit;
Ksharp Message was edited by: Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you all for help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
select t1.*
from
main t1
left outer join minor t2
on t1.key=t2.key
where
t2.key is null;
quit;
select t1.*
from
main t1
left outer join minor t2
on t1.key=t2.key
where
t2.key is null;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Littlestone,
You asking for proc SQL but the best way to do it is to use a datastep merge like this:
[pre]
data Inner Notinner;
merge set1(in=s1) set2 (in=s2);
if s1 and s2 then output Inner;
if s1 and not s2 then output Notinner;
by x;
run;
[/pre]
This merge requires a preliminary sorting of set1 and set2 but simultaneously creates two desired outputs.
Sincerely,
SPR
You asking for proc SQL but the best way to do it is to use a datastep merge like this:
[pre]
data Inner Notinner;
merge set1(in=s1) set2 (in=s2);
if s1 and s2 then output Inner;
if s1 and not s2 then output Notinner;
by x;
run;
[/pre]
This merge requires a preliminary sorting of set1 and set2 but simultaneously creates two desired outputs.
Sincerely,
SPR
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SPR: In the case that the input datasets have some common satellite (or non-key) vars, your output dataset can be incorrect. Here's an example:
data one;
input id var @@;
cards;
1 1 2 2 3 3 4 4 5 5
;
run;
data two;
input id var @@;
cards;
1 6 3 7 5 8 7 9 9 10
;
run;
proc sort data=one; by id; run;
proc sort data=two; by id; run;
data inner notinner;
merge one(in=s1) two(in=s2);
if s1 and s2 then output Inner;
if s1 and not s2 then output Notinner;
by id;
run;
proc print data=inner;
run;
/* on lst -- Uh-oh!
Obs id var
1 1 6
2 3 7
3 5 8
*/
data one;
input id var @@;
cards;
1 1 2 2 3 3 4 4 5 5
;
run;
data two;
input id var @@;
cards;
1 6 3 7 5 8 7 9 9 10
;
run;
proc sort data=one; by id; run;
proc sort data=two; by id; run;
data inner notinner;
merge one(in=s1) two(in=s2);
if s1 and s2 then output Inner;
if s1 and not s2 then output Notinner;
by id;
run;
proc print data=inner;
run;
/* on lst -- Uh-oh!
Obs id var
1 1 6
2 3 7
3 5 8
*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chang_y_chung,
It is very easy to overcome this obstacle simply renaming VAR for the second dataset.
Sincerely,
SPR
It is very easy to overcome this obstacle simply renaming VAR for the second dataset.
Sincerely,
SPR