BookmarkSubscribeRSS Feed
MadhuriDara
Calcite | Level 5

Hello  Experts,

I am new to SAS programming and I am supporting a PROD ETL process written in SAS. It connects to Oracle DB and the processing is between Oracle and SAS. These processes run every day.

These ETL processes run for more than 17hrs. Looking to tune these processes, we noticed that 

at one particular point the script seems to hang for more than 8 hrs. On a good day this step completes in couple of mins.

The ETL processes are inter-related. Each step depends on its previous step.

Attached is a pic of the WORK Library where I see the server is chugging to create big TMP utility files.

 

Please help me understand the bottleneck here. 

 

Here is a snippet that just hangs: 

NDS_RDB is our oracle database. RDB is the SAS library. LDF_DIMENSIONAL_DATA is an object created in the previous step.

It creates BASE_FOODBORNE, LINKED_FOODBORNE and ALL_FOODBORNE objects in the WORK library.

In the WORK library I see BASE_FOODBORNE and LINKED_FOODBORNE created. 

ALL_FOODBORNE is not seen and I see temp utility files growing big.

 

PROC SQL;
DROP TABLE NBS_RDB.LDF_FOODBORNE;

 

CREATE TABLE BASE_FOODBORNE AS
SELECT * FROM LDF_DIMENSIONAL_DATA WHERE PHC_CD IN (SELECT CONDITION_CD FROM
NBS_RDB.LDF_DATAMART_TABLE_REF WHERE DATAMART_NAME = 'LDF_FOODBORNE');

 

CREATE TABLE LINKED_FOODBORNE AS
SELECT GEN_LDF.*,
INV.INVESTIGATION_KEY,
INV.INV_LOCAL_ID AS INVESTIGATION_LOCAL_ID 'INVESTIGATION_LOCAL_ID',
INV.CASE_OID AS PROGRAM_JURISDICTION_OID 'PROGRAM_JURISDICTION_OID',
GEN.PATIENT_KEY,
PATIENT.PATIENT_LOCAL_ID AS PATIENT_LOCAL_ID 'PATIENT_LOCAL_ID',
CONDITION.CONDITION_SHORT_NM AS DISEASE_NAME 'DISEASE_NAME',
CONDITION.CONDITION_CD,
GEN_LDF.PHC_CD
FROM
BASE_FOODBORNE GEN_LDF
INNER JOIN NBS_RDB.INVESTIGATION INV
ON
GEN_LDF.INVESTIGATION_UID=INV.CASE_UID
INNER JOIN NBS_RDB.GENERIC_CASE GEN
ON
GEN.INVESTIGATION_KEY=INV.INVESTIGATION_KEY
INNER JOIN NBS_RDB.CONDITION
ON
CONDITION.CONDITION_KEY= GEN.CONDITION_KEY
INNER JOIN NBS_RDB.D_PATIENT PATIENT
ON
PATIENT.PATIENT_KEY=GEN.PATIENT_KEY
ORDER BY
INVESTIGATION_UID;


CREATE TABLE ALL_FOODBORNE AS
SELECT A.*,
B.DATAMART_COLUMN_NM AS DM 'DM'
FROM NBS_RDB.LDF_DATAMART_COLUMN_REF B
FULL OUTER JOIN LINKED_FOODBORNE A
ON A.LDF_UID= B.LDF_UID WHERE
(B.LDF_PAGE_SET ='OTHER'
OR B.CONDITION_CD IN (SELECT CONDITION_CD FROM
NBS_RDB.LDF_DATAMART_TABLE_REF WHERE DATAMART_NAME = 'LDF_FOODBORNE')
)
ORDER BY
INVESTIGATION_UID;
QUIT;

 

2 REPLIES 2
Shmuel
Garnet | Level 18

I don't know how big are your tables but in your code there are many times approach to oracle database table which uses a lot of resources.

 

you can save a lot of time if you work in a sas as much as possible.

I have attached a ms-word document with your code with NDS_RDS. marked in yellow.

 

I suggest to load the oracle table into sas and eliminate all those references to oracle.

 

 

ChrisNZ
Tourmaline | Level 20

A few further comments:

 

1. Try to split the process happening in oracle and the process happening in SAS. 

Bringing whole tables to SAS is inefficient.

For example

 

FROM
BASE_FOODBORNE GEN_LDF
INNER JOIN NBS_RDB.INVESTIGATION INV
ON
GEN_LDF.INVESTIGATION_UID=INV.CASE_UID
INNER JOIN NBS_RDB.GENERIC_CASE GEN
ON
GEN.INVESTIGATION_KEY=INV.INVESTIGATION_KEY
INNER JOIN NBS_RDB.CONDITION
ON
CONDITION.CONDITION_KEY= GEN.CONDITION_KEY
INNER JOIN NBS_RDB.D_PATIENT PATIENT
ON
PATIENT.PATIENT_KEY=GEN.PATIENT_KEY

All the joins in blue are oracle tables.

 

Make the join happen there, or example by coding something like

 

FROM
BASE_FOODBORNE GEN_LDF
INNER JOIN (select ???  from NBS_RDB.INVESTIGATION INV
INNER JOIN NBS_RDB.GENERIC_CASE GEN
ON
GEN.INVESTIGATION_KEY=INV.INVESTIGATION_KEY
INNER JOIN NBS_RDB.CONDITION
ON
CONDITION.CONDITION_KEY= GEN.CONDITION_KEY
INNER JOIN NBS_RDB.D_PATIENT PATIENT
ON
PATIENT.PATIENT_KEY=GEN.PATIENT_KEY  )  k

 

2. Use option sastrace to verify what queries are sent to Oracle.

 

 

3. In my experience a clause such as    where VAL in (select ..from... )    is less efficient then a join. Use (typically inner) joins.

 

4. How many values are returned by    SELECT CONDITION_CD FROM NBS_RDB.LDF_DATAMART_TABLE_REF WHERE DATAMART_NAME = 'LDF_FOODBORNE'   ?

It might be better to create a static string using macro variables. 

 

5. You should also add a DISTINCT in the query above if applicable

 

6. Do us and yourself a favour: format your code. It is illegible. For example

CREATE TABLE ALL_FOODBORNE AS
SELECT A.*,
B.DATAMART_COLUMN_NM AS DM 'DM'
FROM NBS_RDB.LDF_DATAMART_COLUMN_REF B
FULL OUTER JOIN LINKED_FOODBORNE A
ON A.LDF_UID= B.LDF_UID WHERE
(B.LDF_PAGE_SET ='OTHER'
OR B.CONDITION_CD IN (SELECT CONDITION_CD FROM
NBS_RDB.LDF_DATAMART_TABLE_REF WHERE DATAMART_NAME = 'LDF_FOODBORNE')

could be something like

create table ALL_FOODBORNE as
select A.* 
      ,B.DATAMART_COLUMN_NM as DM 'DM'
from 
  NBS_RDB.LDF_DATAMART_COLUMN_REF  B
    full outer join 
  LINKED_FOODBORNE                 A
    on A.LDF_UID= B.LDF_UID
where
  (B.LDF_PAGE_SET ='OTHER'
  or B.CONDITION_CD in (select unique CONDITION_CD 
                        from NBS_RDB.LDF_DATAMART_TABLE_REF
                        where DATAMART_NAME = 'LDF_FOODBORNE')
  )

A couple more lines, but 1000% easier to understand and debug.

Note that in this example too, the 2 oracle tables should be queried together before being joined to the SAS table,

which is perfect since the where clause is on the Oracle data.

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 296 views
  • 1 like
  • 3 in conversation