Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Left join not keeping all records from left hand table

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Left join not keeping all records from left hand table

I have a left join like this:

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

;

quit;

Basically join a to b on code if it is non null in A else join A to B on code2.

Problem is that A has 100 rows but output has 90 rows only.

Why is this?


Accepted Solutions
Solution
‎07-11-2017 01:43 PM
Respected Advisor
Posts: 4,659

Re: Left join not keeping all records from left hand table

Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:

data a;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

d . 5

. d 6

d c 7

b d 8

;

data b;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

;

title "Original query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

title "Generated query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

PG

PG

View solution in original post


All Replies
Respected Advisor
Posts: 4,659

Re: Left join not keeping all records from left hand table

Hmm puzzling...

I tried toying with a small example and can't get the join to drop obs from A. Can you post a trimmed down version of your data that causes at least one obs from A to be dropped from the join result? - PG

PG
Super User
Posts: 17,912

Re: Left join not keeping all records from left hand table

<> is not NOT EQUALS in SAS, it's MAX I believe.

Try using NE.

Respected Advisor
Posts: 4,659

Re: Left join not keeping all records from left hand table

, when you run the query as written you get:

150  proc sql;

151  select * from a left join b on

152           (

153              case when a.code <> "" then a.code = b.code

154              else a.code2 =  b.code2

155              end

156           )

157  ;

NOTE: The "<>" operator is interpreted as "not equals".

NOTE: The execution of this query involves performing one or more Cartesian product

      joins that can not be optimized.

158  quit;

In the small examples I tried, the join worked as expected.

PG

PG
Trusted Advisor
Posts: 1,204

Re: Left join not keeping all records from left hand table

Try two joining conditions separately to check the overlap.

proc sql;

select * from a left join b on

a.code = b.code ;

quit;

proc sql;

select * from a left join b on

a.code2 = b.code2 ;

quit;

Super User
Super User
Posts: 7,423

Re: Left join not keeping all records from left hand table

Hi,

I would agree with stat@sas, check the data first.  Personally I wouldn't go with the kind of coding which has case statements in a join like that.  Me I would (without being able to test as having no test data) do:

proc sql;

  create table WANT as

  select  *

  from    A

  left join B

  on      (A.CODE <> "" and A.CODE=B.CODE)

  or      (A.CODE="" and A.CODE2=B.CODE2);

quit;

Valued Guide
Posts: 3,208

Re: Left join not keeping all records from left hand table

I the OP's question involving a RDBMS? The Null in a RDBMS (three value logic) is not the same as a missing in SAS. The space as missing value is assumed to be logical equal and can cause some weird effects

Seeing the "<>" operator it could be translated a NE. What says it is producing a "Cartesian product". Theoretically that can solve anything but is assuming unlimited speed and unlimited other resources when the data grows.
With some tests using sas-datasets I can get more records then the original (a count). Did not succeed in getting in less records. 

---->-- ja karman --<-----
Frequent Contributor
Posts: 82

Re: Left join not keeping all records from left hand table

Thanks. Posting the data is not possible ...i will get into all kinds of data confidentiality issues.

I did  this in DI studio and i saw DIS rewrote the code like:

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

;

quit;

Can that be a possible reason?

Super User
Super User
Posts: 7,423

Re: Left join not keeping all records from left hand table

Well, I suppose there's a good illustration of why "Visual Coding" isn't necessarily the best way to do coding.  What it appears to be doing is to work out what your doing wrong and try to fix it.  My advice, think about what you want out at the end, where that data will come from, then manipulations involved.  Then its simply a matter of putting the bits together.

Valued Guide
Posts: 3,208

Re: Left join not keeping all records from left hand table

The di generation of code does not given an answer whether is an external RDBMS or not. It is often generating  common SAS code.
The effects of an implicit pass through SQL translation is not shown. Is it an exterenal DBMS?

The number of records is relative low (100/90). Is there a test to define and sashelp.datasets that shows the effect.
Did the code run and where there special messages (log)?

---->-- ja karman --<-----
Solution
‎07-11-2017 01:43 PM
Respected Advisor
Posts: 4,659

Re: Left join not keeping all records from left hand table

Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:

data a;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

d . 5

. d 6

d c 7

b d 8

;

data b;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

;

title "Original query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

title "Generated query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

PG

PG
Valued Guide
Posts: 3,208

Re: Left join not keeping all records from left hand table

Pgstars I did a same like test.  but kept the numbers separated and ordered them on the first table.

Result:

a a 1 a a 1

b   2 b   2

  c 3   c 3

    4 b   2

    4     4

d   5     .

  d 6     .

d c 7     .

b d 8 b   2

No one in the input records of the table a is missing, it is a left-join. One repetition on each side 3 non joined.

---->-- ja karman --<-----
Respected Advisor
Posts: 4,659

Re: Left join not keeping all records from left hand table

That's the result I get from the first query above. The second query returns only 6 records in my tests. Records where x = 5, 6, and 7 are missing. - PG

PG
Valued Guide
Posts: 3,208

Re: Left join not keeping all records from left hand table

The first code of eagles dare is joining on several conditions in one pass. That was my the focus on.

The second query as generated by DI is different the results are (correct indicated by PG) :

a a 1 a a 1

b   2 b   2

  c 3   c 3

    4 b   2

    4     4

b d 8 b   2

---->-- ja karman --<-----
Valued Guide
Posts: 3,208

Re: Left join not keeping all records from left hand table

Ok question is not on the original Sql querey but what DI studio is doing with joins and generating code.

SAS(R) Data Integration Studio 4.7: User's Guide  In the mentioned case it should not generated the second part with the where. 

---->-- ja karman --<-----
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 1260 views
  • 8 likes
  • 6 in conversation