BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eagles_dare13
Obsidian | Level 7

I have a left join like this:

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

;

quit;

Basically join a to b on code if it is non null in A else join A to B on code2.

Problem is that A has 100 rows but output has 90 rows only.

Why is this?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:

data a;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

d . 5

. d 6

d c 7

b d 8

;

data b;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

;

title "Original query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

title "Generated query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

PG

PG

View solution in original post

14 REPLIES 14
PGStats
Opal | Level 21

Hmm puzzling...

I tried toying with a small example and can't get the join to drop obs from A. Can you post a trimmed down version of your data that causes at least one obs from A to be dropped from the join result? - PG

PG
Reeza
Super User

<> is not NOT EQUALS in SAS, it's MAX I believe.

Try using NE.

PGStats
Opal | Level 21

, when you run the query as written you get:

150  proc sql;

151  select * from a left join b on

152           (

153              case when a.code <> "" then a.code = b.code

154              else a.code2 =  b.code2

155              end

156           )

157  ;

NOTE: The "<>" operator is interpreted as "not equals".

NOTE: The execution of this query involves performing one or more Cartesian product

      joins that can not be optimized.

158  quit;

In the small examples I tried, the join worked as expected.

PG

PG
stat_sas
Ammonite | Level 13

Try two joining conditions separately to check the overlap.

proc sql;

select * from a left join b on

a.code = b.code ;

quit;

proc sql;

select * from a left join b on

a.code2 = b.code2 ;

quit;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

I would agree with stat@sas, check the data first.  Personally I wouldn't go with the kind of coding which has case statements in a join like that.  Me I would (without being able to test as having no test data) do:

proc sql;

  create table WANT as

  select  *

  from    A

  left join B

  on      (A.CODE <> "" and A.CODE=B.CODE)

  or      (A.CODE="" and A.CODE2=B.CODE2);

quit;

jakarman
Barite | Level 11

I the OP's question involving a RDBMS? The Null in a RDBMS (three value logic) is not the same as a missing in SAS. The space as missing value is assumed to be logical equal and can cause some weird effects

Seeing the "<>" operator it could be translated a NE. What says it is producing a "Cartesian product". Theoretically that can solve anything but is assuming unlimited speed and unlimited other resources when the data grows.
With some tests using sas-datasets I can get more records then the original (a count). Did not succeed in getting in less records. 

---->-- ja karman --<-----
eagles_dare13
Obsidian | Level 7

Thanks. Posting the data is not possible ...i will get into all kinds of data confidentiality issues.

I did  this in DI studio and i saw DIS rewrote the code like:

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code 

            else a.code2 =  b.code2

            end

         )

;

quit;

Can that be a possible reason?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I suppose there's a good illustration of why "Visual Coding" isn't necessarily the best way to do coding.  What it appears to be doing is to work out what your doing wrong and try to fix it.  My advice, think about what you want out at the end, where that data will come from, then manipulations involved.  Then its simply a matter of putting the bits together.

jakarman
Barite | Level 11

The di generation of code does not given an answer whether is an external RDBMS or not. It is often generating  common SAS code.
The effects of an implicit pass through SQL translation is not shown. Is it an exterenal DBMS?

The number of records is relative low (100/90). Is there a test to define and sashelp.datasets that shows the effect.
Did the code run and where there special messages (log)?

---->-- ja karman --<-----
PGStats
Opal | Level 21

Yes, of course, the WHERE clause will eliminate the cases where there is no match on CODE and on CODE2. Compare results from the following:

data a;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

d . 5

. d 6

d c 7

b d 8

;

data b;

length code code2 $1;

input code code2 x;

datalines;

a a 1

b . 2

. c 3

. . 4

;

title "Original query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

title "Generated query";

proc sql;

select * from a left join b on

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

where

         (

            case when a.code <> "" then a.code = b.code

            else a.code2 =  b.code2

            end

         )

;

quit;

PG

PG
jakarman
Barite | Level 11

Pgstars I did a same like test.  but kept the numbers separated and ordered them on the first table.

Result:

a a 1 a a 1

b   2 b   2

  c 3   c 3

    4 b   2

    4     4

d   5     .

  d 6     .

d c 7     .

b d 8 b   2

No one in the input records of the table a is missing, it is a left-join. One repetition on each side 3 non joined.

---->-- ja karman --<-----
PGStats
Opal | Level 21

That's the result I get from the first query above. The second query returns only 6 records in my tests. Records where x = 5, 6, and 7 are missing. - PG

PG
jakarman
Barite | Level 11

The first code of eagles dare is joining on several conditions in one pass. That was my the focus on.

The second query as generated by DI is different the results are (correct indicated by PG) :

a a 1 a a 1

b   2 b   2

  c 3   c 3

    4 b   2

    4     4

b d 8 b   2

---->-- ja karman --<-----
jakarman
Barite | Level 11

Ok question is not on the original Sql querey but what DI studio is doing with joins and generating code.

SAS(R) Data Integration Studio 4.7: User's Guide  In the mentioned case it should not generated the second part with the where. 

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 15388 views
  • 8 likes
  • 6 in conversation