Hi,
I tend to have combination of inner and left joins in my join transformations.
For example:
FROM BUS_DATA_MAN.FACT_CUSTOMER_LOCATION CLF INNER JOIN BUS_DATA_MAN.DIM_LOCATION DL ON CLF.ADDRESS_ID = DL.ADDRESS_ID INNER JOIN BUS_DATA_MAN.DIM_CUSTOMER DC ON CLF.KID_USER = DC.KID LEFT JOIN BUS_DATA_MAN.DIM_CORPORATE_STRUCTURE_FLAT DCS ON DC.ORG_NO = DCS.ORG_NO LEFT JOIN KOG.B2B_COMPANIES B2B ON DC.KID = B2B.KID LEFT JOIN BB_XDSL ON BB_XDSL.KID = DC.KID AND BB_XDSL.ADDRESS_ID = DL.ADDRESS_ID LEFT JOIN BBF ON BBF.KID = DC.KID AND BBF.ADDRESS_ID = DL.ADDRESS_ID LEFT JOIN NORDIC_CONNECT ON NORDIC_CONNECT.KID = DC.KID AND NORDIC_CONNECT.ADDRESS_ID = DL.ADDRESS_ID LEFT JOIN ETHERNET_CONNECT ON ETHERNET_CONNECT.KID = DC.KID AND ETHERNET_CONNECT.ADDRESS_ID = DL.ADDRESS_ID LEFT JOIN POTS_BCHANNELS_CL ON POTS_BCHANNELS_CL.KID = DC.KID LEFT JOIN MOBIL_TALE ON MOBIL_TALE.KID = DC.KID LEFT JOIN MOBILT_BREDBÅND ON MOBILT_BREDBÅND.KID = DC.KID
Setting this in the same order as listed above in DI Studio leads to work space fill-up issues in many use-cases.
I wonder it there is some best practice in regards to in which order join members are put in join transformation (first LEFT, then INNER, vice versa or it plays no role)?
I tried to find SAS recommendation on this, but no luck.
I am using DI Studio 4.902 on SAS 9.4 platform.
Thanks in advance!
Best regards,
One thing I've observed with SAS SQL is that joining many tables in a single SQL can lead to significant performance degradation and it's often better to split such a SQL up into more than one step (creating an intermediary table).
Not sure why this happens but it could be that the SQL optimizer doesn't always find a good execution path.
If you're taking such a "split into multiple steps" approach then of course try and break this up into logical junks with the intermediary table(s) having as low data volume as possible.
Also consider using Hash lookups where appropriate as this doesn't require sorting like SQL joins. And also consider creating indeces for your join conditions.
Hi,
The code above is plain SQL code.
Portion of SAS code withing that join:
from
work.FACT_CUSTOMER_LOCATION_TMP01_QA as FACT_CUSTOMER_LOCATION_TMP01_QA inner join
work.DIM_LOCATION_TMP01_QA as DIM_LOCATION_TMP01_QA
on
(
FACT_CUSTOMER_LOCATION_TMP01_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
) inner join
work.DIM_CUSTOMER_TMP01_QA as DIM_CUSTOMER_TMP01_QA
on
(
FACT_CUSTOMER_LOCATION_TMP01_QA.KID_USER = DIM_CUSTOMER_TMP01_QA.KID
) left join
work.DIM_CORPORATE_STRUCTURE_FLAT_T_Q as DIM_CORPORATE_STRUCTURE_FLAT_T_Q
on
(
DIM_CUSTOMER_TMP01_QA.ORG_NO = DIM_CORPORATE_STRUCTURE_FLAT_T_Q.ORG_NO
) left join
work.B2B_COMPANIES_TMP01_QA as B2B_COMPANIES_TMP01_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = B2B_COMPANIES_TMP01_QA.KID
) left join
work.BB_XDSL_QA as BB_XDSL_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = BB_XDSL_QA.KID
and BB_XDSL_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
) left join
work.BBF_QA as BBF_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = BBF_QA.KID
and BBF_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
) left join
work.NORDIC_CONNECT_QA as NORDIC_CONNECT_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = NORDIC_CONNECT_QA.KID
and NORDIC_CONNECT_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
) left join
work.ETHERNET_CONNECT_QA as ETHERNET_CONNECT_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = ETHERNET_CONNECT_QA.KID
and ETHERNET_CONNECT_QA.ADDRESS_ID = DIM_LOCATION_TMP01_QA.ADDRESS_ID
) left join
work.POTS_BCHANNELS_CL_QA as POTS_BCHANNELS_CL_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = POTS_BCHANNELS_CL_QA.KID
) left join
work.MOBIL_TALE_QA as MOBIL_TALE_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = MOBIL_TALE_QA.KID
) left join
work.MOBILT_BREDBAND_QA as MOBILT_BREDBAND_QA
on
(
DIM_CUSTOMER_TMP01_QA.KID = MOBILT_BREDBAND_QA.KID
)
I tried to reduce columns before join step to only the ones I will need. I still experience a lot of issues in regards to work space usage.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.