SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Combination of INNER and LEFT joins (DI Studio)

Reply
Contributor
Posts: 26

Combination of INNER and LEFT joins (DI Studio)

[ Edited ]

Hi,

 

I tend to have combination of inner and left joins in my join transformations.

For example:

 

FROM
   BUS_DATA_MAN.FACT_CUSTOMER_LOCATION CLF 
      INNER JOIN BUS_DATA_MAN.DIM_LOCATION DL 
      ON CLF.ADDRESS_ID = DL.ADDRESS_ID 
      INNER JOIN BUS_DATA_MAN.DIM_CUSTOMER DC 
      ON CLF.KID_USER = DC.KID 
      LEFT JOIN BUS_DATA_MAN.DIM_CORPORATE_STRUCTURE_FLAT DCS 
      ON DC.ORG_NO = DCS.ORG_NO 
      LEFT JOIN KOG.B2B_COMPANIES B2B 
      ON DC.KID = B2B.KID 
      LEFT JOIN BB_XDSL 
      ON BB_XDSL.KID = DC.KID AND
      BB_XDSL.ADDRESS_ID = DL.ADDRESS_ID 
      LEFT JOIN BBF 
      ON BBF.KID = DC.KID AND
      BBF.ADDRESS_ID = DL.ADDRESS_ID 
      LEFT JOIN NORDIC_CONNECT 
      ON NORDIC_CONNECT.KID = DC.KID AND
      NORDIC_CONNECT.ADDRESS_ID = DL.ADDRESS_ID 
      LEFT JOIN ETHERNET_CONNECT 
      ON ETHERNET_CONNECT.KID = DC.KID AND
      ETHERNET_CONNECT.ADDRESS_ID = DL.ADDRESS_ID 
      LEFT JOIN POTS_BCHANNELS_CL 
      ON POTS_BCHANNELS_CL.KID = DC.KID 
      LEFT JOIN MOBIL_TALE 
      ON MOBIL_TALE.KID = DC.KID 
      LEFT JOIN MOBILT_BREDBÅND 
      ON MOBILT_BREDBÅND.KID = DC.KID

Setting this in the same order as listed above in DI Studio leads to work space fill-up issues in many use-cases.

I wonder it there is some best practice in regards to in which order join members are put in join transformation (first LEFT, then INNER, vice versa or it plays no role)?

 

I tried to find SAS recommendation on this, but no luck.

 

I am using DI Studio 4.902 on SAS 9.4 platform.

 

Thanks in advance!

 

Best regards,

 

--
Mario
Respected Advisor
Posts: 4,665

Re: Combination of INNER and LEFT joins (DI Studio)

[ Edited ]

@strsljen

One thing I've observed with SAS SQL is that joining many tables in a single SQL can lead to significant performance degradation and it's often better to split such a SQL up into more than one step (creating an intermediary table).

Not sure why this happens but it could be that the SQL optimizer doesn't always find a good execution path.

 

If you're taking such a "split into multiple steps" approach then of course try and break this up into logical junks with the intermediary table(s) having as low data volume as possible.

 

Also consider using Hash lookups where appropriate as this doesn't require sorting like SQL joins. And also consider creating indeces for your join conditions.

Super User
Posts: 5,849

Re: Combination of INNER and LEFT joins (DI Studio)

Hard to give one rule fits all kind of answer.
For starters order of inner and outer joins us there to fulfil a logic, so you can't tamper with freely.
- give as much RAM as possible to the session
- start with the biggest table
- have that table indexed on join columns
- join in an order that limits the data as much as possible

Is this really a SAS query? Since you are having a libname with more than eight chars...
Data never sleeps
Contributor
Posts: 26

Re: Combination of INNER and LEFT joins (DI Studio)

Hi,

 

The code above is plain SQL code.

 

Portion of SAS code withing that join:

   from
      work.FACT_CUSTOMER_LOCATION_TMP01_QA as FACT_CUSTOMER_LOCATION_TMP01_QA inner join 
      work.DIM_LOCATION_TMP01_QA as DIM_LOCATION_TMP01_QA
         on
         (
            FACT_CUSTOMER_LOCATION_TMP01_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
         ) inner join 
      work.DIM_CUSTOMER_TMP01_QA as DIM_CUSTOMER_TMP01_QA
         on
         (
            FACT_CUSTOMER_LOCATION_TMP01_QA.KID_USER = DIM_CUSTOMER_TMP01_QA.KID
         ) left join 
      work.DIM_CORPORATE_STRUCTURE_FLAT_T_Q as DIM_CORPORATE_STRUCTURE_FLAT_T_Q
         on
         (
            DIM_CUSTOMER_TMP01_QA.ORG_NO = DIM_CORPORATE_STRUCTURE_FLAT_T_Q.ORG_NO
         ) left join 
      work.B2B_COMPANIES_TMP01_QA as B2B_COMPANIES_TMP01_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = B2B_COMPANIES_TMP01_QA.KID
         ) left join 
      work.BB_XDSL_QA as BB_XDSL_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = BB_XDSL_QA.KID
            and BB_XDSL_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
         ) left join 
      work.BBF_QA as BBF_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = BBF_QA.KID
            and BBF_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
         ) left join 
      work.NORDIC_CONNECT_QA as NORDIC_CONNECT_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = NORDIC_CONNECT_QA.KID
            and NORDIC_CONNECT_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
         ) left join 
      work.ETHERNET_CONNECT_QA as ETHERNET_CONNECT_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = ETHERNET_CONNECT_QA.KID
            and ETHERNET_CONNECT_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
         ) left join 
      work.POTS_BCHANNELS_CL_QA as POTS_BCHANNELS_CL_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = POTS_BCHANNELS_CL_QA.KID
         ) left join 
      work.MOBIL_TALE_QA as MOBIL_TALE_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = MOBIL_TALE_QA.KID
         ) left join 
      work.MOBILT_BREDBAND_QA as MOBILT_BREDBAND_QA
         on
         (
            DIM_CUSTOMER_TMP01_QA.KID = MOBILT_BREDBAND_QA.KID
         )

I tried to reduce columns before join step to only the ones I will need. I still experience a lot of issues in regards to work space usage.

--
Mario
Ask a Question
Discussion stats
  • 3 replies
  • 158 views
  • 1 like
  • 3 in conversation