- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a left join like this:
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
Basically join a to b on code if it is non null in A else join A to B on code2.
Problem is that A has 100 rows but output has 90 rows only.
Why is this?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:
data a;
length code code2 $1;
input code code2 x;
datalines;
a a 1
b . 2
. c 3
. . 4
d . 5
. d 6
d c 7
b d 8
;
data b;
length code code2 $1;
input code code2 x;
datalines;
a a 1
b . 2
. c 3
. . 4
;
title "Original query";
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
title "Generated query";
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
where
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hmm puzzling...
I tried toying with a small example and can't get the join to drop obs from A. Can you post a trimmed down version of your data that causes at least one obs from A to be dropped from the join result? - PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
<> is not NOT EQUALS in SAS, it's MAX I believe.
Try using NE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
, when you run the query as written you get:
150 proc sql;
151 select * from a left join b on
152 (
153 case when a.code <> "" then a.code = b.code
154 else a.code2 = b.code2
155 end
156 )
157 ;
NOTE: The "<>" operator is interpreted as "not equals".
NOTE: The execution of this query involves performing one or more Cartesian product
joins that can not be optimized.
158 quit;
In the small examples I tried, the join worked as expected.
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try two joining conditions separately to check the overlap.
proc sql;
select * from a left join b on
a.code = b.code ;
quit;
proc sql;
select * from a left join b on
a.code2 = b.code2 ;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would agree with stat@sas, check the data first. Personally I wouldn't go with the kind of coding which has case statements in a join like that. Me I would (without being able to test as having no test data) do:
proc sql;
create table WANT as
select *
from A
left join B
on (A.CODE <> "" and A.CODE=B.CODE)
or (A.CODE="" and A.CODE2=B.CODE2);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I the OP's question involving a RDBMS? The Null in a RDBMS (three value logic) is not the same as a missing in SAS. The space as missing value is assumed to be logical equal and can cause some weird effects
Seeing the "<>" operator it could be translated a NE. What says it is producing a "Cartesian product". Theoretically that can solve anything but is assuming unlimited speed and unlimited other resources when the data grows.
With some tests using sas-datasets I can get more records then the original (a count). Did not succeed in getting in less records.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. Posting the data is not possible ...i will get into all kinds of data confidentiality issues.
I did this in DI studio and i saw DIS rewrote the code like:
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
where
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
Can that be a possible reason?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I suppose there's a good illustration of why "Visual Coding" isn't necessarily the best way to do coding. What it appears to be doing is to work out what your doing wrong and try to fix it. My advice, think about what you want out at the end, where that data will come from, then manipulations involved. Then its simply a matter of putting the bits together.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The di generation of code does not given an answer whether is an external RDBMS or not. It is often generating common SAS code.
The effects of an implicit pass through SQL translation is not shown. Is it an exterenal DBMS?
The number of records is relative low (100/90). Is there a test to define and sashelp.datasets that shows the effect.
Did the code run and where there special messages (log)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:
data a;
length code code2 $1;
input code code2 x;
datalines;
a a 1
b . 2
. c 3
. . 4
d . 5
. d 6
d c 7
b d 8
;
data b;
length code code2 $1;
input code code2 x;
datalines;
a a 1
b . 2
. c 3
. . 4
;
title "Original query";
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
title "Generated query";
proc sql;
select * from a left join b on
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
where
(
case when a.code <> "" then a.code = b.code
else a.code2 = b.code2
end
)
;
quit;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Pgstars I did a same like test. but kept the numbers separated and ordered them on the first table.
Result:
a a 1 a a 1
b 2 b 2
c 3 c 3
4 b 2
4 4
d 5 .
d 6 .
d c 7 .
b d 8 b 2
No one in the input records of the table a is missing, it is a left-join. One repetition on each side 3 non joined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's the result I get from the first query above. The second query returns only 6 records in my tests. Records where x = 5, 6, and 7 are missing. - PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The first code of eagles dare is joining on several conditions in one pass. That was my the focus on.
The second query as generated by DI is different the results are (correct indicated by PG) :
a a 1 a a 1
b 2 b 2
c 3 c 3
4 b 2
4 4
b d 8 b 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok question is not on the original Sql querey but what DI studio is doing with joins and generating code.
SAS(R) Data Integration Studio 4.7: User's Guide In the mentioned case it should not generated the second part with the where.