SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio join differ from Oracle SQL on the same tables

Reply
Contributor
Posts: 27

DI Studio join differ from Oracle SQL on the same tables

[ Edited ]

Hi,

 

Here is the query in Oracle SQL:

 

select s.OWNER_CUSTOMER_KEY, s.SUBSCRIPTION_KEY, rap.COMM_SHAPE_ID, rap.RES_APP_INDICATOR_ID
from SUBSCRIPTION_DIM s 
left join MASTER_CUSTOMER mc on mc.KURT_ID = s.USER_CUSTOMER_KEY and s.DRM_MARKET_PRODUCT_GROUP = 'Mobil Tale'
left join RES_AND_APPROVAL rap on rap.MASTER_ID = mc.MASTER_ID and rap.comm_shape_id in (3,4) and rap.RES_APP_VALID_TO_DATE is null

This one reruns 4 160 298 rows.

 

 

SAS DI job with join transformation returns ridiculous 227 million rows. 

proc sql;
   create table work.W5JS0I6C as
   select
      SUBSCRIPTION_DIM.OWNER_CUSTOMER_KEY length = 8   
         label = 'OWNER_CUSTOMER_KEY',
      SUBSCRIPTION_DIM.SUBSCRIPTION_KEY length = 8   
         label = 'SUBSCRIPTION_KEY',
      RES_AND_APPROVAL.COMM_SHAPE_ID length = 8   
         label = 'COMM_SHAPE_ID',
      RES_AND_APPROVAL.RES_APP_INDICATOR_ID length = 8   
         label = 'RES_APP_INDICATOR_ID'
   from
      ciccm.SUBSCRIPTION_DIM as SUBSCRIPTION_DIM left join 
      CDC.MASTER_CUSTOMER as MASTER_CUSTOMER
         on
         (
            MASTER_CUSTOMER.KURT_ID  = SUBSCRIPTION_DIM.USER_CUSTOMER_KEY 
            and SUBSCRIPTION_DIM.DRM_MARKET_PRODUCT_GROUP = 'Mobil Tale'
         ) left join 
      CDC.RES_AND_APPROVAL as RES_AND_APPROVAL
         on
         (
            RES_AND_APPROVAL.MASTER_ID  = MASTER_CUSTOMER.MASTER_ID 
            and RES_AND_APPROVAL.COMM_SHAPE_ID  IN (3,4)
            and RES_AND_APPROVAL.RES_APP_VALID_TO_DATE  = .
         )
   ;
quit;

 

 

I would say I am doing something very wrong here, I just don't see it.

I tried to variate DI job: extract subset from source tables first to make join "lighter", but regardless I get huge and wrong result data set.

 

Interesting is that when I run join on only first 2 tables, result is correct.

When I include 3rd one in this case, everything goes down the drain.

 

Any hint is more than welcomed.

 

Best regards,

--
Mario
Super User
Posts: 5,852

Re: DI Studio join differ from Oracle SQL on the same tables

If you only moved the query, add
Options sastrace = ',, d' sastraceloc = saslog nostsuffix;
You'll get the what the query transform to in Oracle.
Your Oracle query seems to accessing the same schema, but in SAS you have two libnames? Shouldn't change the logic though, just being curious...
Data never sleeps
Contributor
Posts: 27

Re: DI Studio join differ from Oracle SQL on the same tables

Hi,

 

Thanks for the hint.

I removed scheme names from Oracle SQL for things to be clearer. 

In both cases, the same schemes are accessed.

 

I checked the log with those options turned ON (attached). SAS accesses two Oracle schemas as two different users, if that makes any difference.

I didn't find anything that would point the cause of the problem in that log.

 

However, I did try to do an extract to work tables first (only subset of original oracle tables that will be used in a join later. The result was exactly the same .

 

 

--
Mario
Super User
Posts: 3,860

Re: DI Studio join differ from Oracle SQL on the same tables

I suggest you try a SAS passthru query. That should definitely match Oracle.

Super User
Posts: 5,852

Re: DI Studio join differ from Oracle SQL on the same tables

Oh that's a mouthful...
Sorry my fingers stumbling on my phone, sastrace should read
',,,d'
Which hopefully give a nicer log...
Data never sleeps
Ask a Question
Discussion stats
  • 4 replies
  • 137 views
  • 0 likes
  • 3 in conversation