I have the following SAS code that references and joins two large Oracle tables via Oracle libnames (~500 million records per table). This query takes hours and repeats with varying &rule conditions, so the whole program runs for over a day. Are there any strategies I can employ to make the query run faster?
libname iv_ora oracle user=&orauser pass=&orapass path="IVASPRD" schema="IVAS";
libname bids_ora oracle user=exfcread pass="mypw" path="IBSCRPRD" schema="IMAPSSCR";
proc sql;
create table QueryData as
select b.actual_dlvry_date as AD_DT,
b.imb_code length = 31,
b.spm_calc_batch_date
from iv_ora.ods_iv_recon_selected_mp a, bids_ora.ods_bi_recon_selected_mp b
where a.imb_dlvry_zip_5 = '14221'
and a.imb_code = b.imb_code AND &rule;
quit;
When your working with two different server SAS has to first import the data and then do the processing. If possible you may want to find a way to minimize the querying of multiple tables.
What do the rules look like? Without an example it's hard to comment beyond this.
When your working with two different server SAS has to first import the data and then do the processing. If possible you may want to find a way to minimize the querying of multiple tables.
What do the rules look like? Without an example it's hard to comment beyond this.
The rules look like this...
1 A.ACTUAL_DLVRY_DATE IS NULL AND B.ACTUAL_DLVRY_DATE IS NOT NULL
1.5 A.ACTUAL_DLVRY_DATE > B.ACTUAL_DLVRY_DATE
2 NVL(A.ACTUAL_DLVRY_DATE,SYSDATE) <> NVL(B.ACTUAL_DLVRY_DATE,SYSDATE)
3 A.LAST_SCAN_DATETIME < B.LAST_SCAN_DATETIME
3.1 A.LAST_SCAN_DATETIME > B.LAST_SCAN_DATETIME
3.2 A.LAST_SCAN_DATETIME IS NOT NULL AND B.LAST_SCAN_DATETIME IS NULL
3.3 A.LAST_SCAN_DATETIME IS NULL AND B.LAST_SCAN_DATETIME IS NOT NULL
3.5 A.ACTUAL_ENTRY_DATETIME IS NULL AND B.ACTUAL_ENTRY_DATETIME IS NOT NULL
4 NVL(A.ACTUAL_ENTRY_DATETIME,SYSDATE) <> NVL(B.ACTUAL_ENTRY_DATETIME,SYSDATE)
6 NVL(A.CRITICAL_ENTRY_TIME,SYSDATE) <> NVL(B.CRITICAL_ENTRY_TIME,SYSDATE)
10 A.INDCTN_MTHD IS NULL AND B.INDCTN_MTHD IS NOT NULL
11 NVL(A.INDCTN_MTHD,-1)<>NVL(B.INDCTN_MTHD,-1)
15 NVL(A.EXPECTED_DLVRY_DATE , SYSDATE ) <> NVL(B.EXPECTED_DLVRY_DATE , SYSDATE )
19 NVL(A.SVC_STD_ADJ , -1 ) <> NVL(B.SVC_STD_ADJ , -1 )
20 NVL(A.EPFED_FAC_TYPE_CODE , -1 ) <> NVL(B.EPFED_FAC_TYPE_CODE , -1 )
21 NVL(A.SVC_VARIANCE , -1 ) <> NVL(B.SVC_VARIANCE , -1 )
22 NVL(A.ML_CL_CODE , -1 ) <> NVL(B.ML_CL_CODE , -1 )
25 NVL(A.START_THE_CLOCK_DATE , SYSDATE ) <> NVL(B.START_THE_CLOCK_DATE , SYSDATE )
26 NVL(A.CLEARANCE_TIME ,'T') <> NVL(B.CLEARANCE_TIME ,'T')
27 NVL(A.STOP_THE_CLOCK_DATE , SYSDATE ) <> NVL(B.STOP_THE_CLOCK_DATE , SYSDATE )
28 NVL(A.SVC_STD , -1 ) <> NVL(B.SVC_STD , -1 )
29 NVL(A.INTER_SCF_EXCL_IND ,'T') <> NVL(B.INTER_SCF_EXCL_IND ,'T')
30 NVL(A.SUPPL_CONTR_EXCL_IND ,'T') <> NVL(B.SUPPL_CONTR_EXCL_IND ,'T')
34 NVL(A.SCHD_SHIP_DATE_EXCL_IND ,'T') <> NVL(B.SCHD_SHIP_DATE_EXCL_IND ,'T')
37 NVL(A.SATURDAY_HOLD_IND ,'T') <> NVL(B.SATURDAY_HOLD_IND ,'T')
38 NVL(A.ORGN_FAC_ZIP_5,'T') <> NVL(B.ORGN_FAC_ZIP_5,'T')
39 NVL(A.NON_UNIQ_IMCB_EXCL_IND ,'T') <> NVL(B.NON_UNIQ_IMCB_EXCL_IND ,'T')
40 NVL(A.STOP_SCAN_BEF_AET_EXCL_IND ,'T') <> NVL(B.STOP_SCAN_BEF_AET_EXCL_IND ,'T')
41 NVL(A.CONTR_IRREG_EXCL_IND ,'T') <> NVL(B.CONTR_IRREG_EXCL_IND ,'T')
42 NVL(A.NON_UNIQ_IMTB_EXCL_IND,'T') <> NVL(B.NON_UNIQ_IMTB_EXCL_IND,'T')
43 NVL(A.BMEU_PC_SCAN_BEF_AET_EXCL_IND ,'T') <> NVL(B.BMEU_PC_SCAN_BEF_AET_EXCL_IND ,'T')
44 NVL(A.NON_UNIQ_IMB_EXCL_IND ,'T') <> NVL(B.NON_UNIQ_IMB_EXCL_IND ,'T')
45 NVL(A.UNKNOWN_FAC_EXCL_IND ,'T') <> NVL(B.UNKNOWN_FAC_EXCL_IND ,'T')
48 NVL(A.LONG_HAUL_EXCL_IND,'T') <> NVL(B.LONG_HAUL_EXCL_IND,'T')
51 NVL(A.IMB_CODE ,'T') <> NVL(B.IMB_CODE ,'T')
52 NVL(A.DFLT_IMTB_EXCL_IND ,'T') <> NVL(B.DFLT_IMTB_EXCL_IND ,'T')
53 NVL(A.ML_DRCTN_EXCL_IND ,'T') <> NVL(B.ML_DRCTN_EXCL_IND ,'T')
54 NVL(A.INVLD_CTR_LVL_ENT_FAC_EXCL_IND ,'T') <> NVL(B.INVLD_CTR_LVL_ENT_FAC_EXCL_IND ,'T')
56 NVL(A.ACS_NIXIE_EXCL_IND ,'T') <> NVL(B.ACS_NIXIE_EXCL_IND ,'T')
57 NVL(A.PBV_EXCL_IND ,'T') <> NVL(B.PBV_EXCL_IND ,'T')
58 NVL(A.UNLD_STRT_BEF_ACTL_APPT_IND ,'T') <> NVL(B.UNLD_STRT_BEF_ACTL_APPT_IND ,'T')
60 NVL(A.LAST_SCAN_DV_EXCL_IND ,'T') <> NVL(B.LAST_SCAN_DV_EXCL_IND ,'T')
61 NVL(A.ACS_COA_EXCL_IND ,'T') <> NVL(B.ACS_COA_EXCL_IND ,'T')
62 NVL(A.INVALID_ORGN_ZIP3_EXCL_IND ,'T') <> NVL(B.INVALID_ORGN_ZIP3_EXCL_IND ,'T')
63 NVL(A.IMB_DLVRY_ZIP_3 ,'T') <> NVL(B.IMB_DLVRY_ZIP_3 ,'T')
65 NVL(A.NON_MTCHG_SCAN_APPT_EXCL_IND ,'T') <> NVL(B.NON_MTCHG_SCAN_APPT_EXCL_IND ,'T')
67 NVL(A.NO_SVC_STD_EXCL_IND ,'T') <> NVL(B.NO_SVC_STD_EXCL_IND ,'T')
68 NVL(A.INVALID_ENTRY_FAC_EXCL_IND,'T')<>NVL(B.INVALID_ENTRY_FAC_EXCL_IND,'T')
69 NVL(A.INVLD_FNL_PRCS_ZIP5_EXCL_IND ,'T') <> NVL(B.INVLD_FNL_PRCS_ZIP5_EXCL_IND ,'T')
74 NVL(A.NULL_STC_DV_EXCL_IND ,'T') <> NVL(B.NULL_STC_DV_EXCL_IND ,'T')
77 NVL(A.PW_RGE_CNT_EXCL_IND ,'T') <> NVL(B.PW_RGE_CNT_EXCL_IND ,'T')
78 NVL(A.HIGH_DLVRY_EXCL_IND ,'T') <> NVL(B.HIGH_DLVRY_EXCL_IND ,'T')
79 NVL(A.ML_CAT_CODE , -1 ) <> NVL(B.ML_CAT_CODE , -1 )
80 NVL(A.STOP_SCAN_BEF_STC_EXCL_IND ,'T') <> NVL(B.STOP_SCAN_BEF_STC_EXCL_IND ,'T')
81 NVL(A.INVALID_DPV_EXCL_IND ,'T') <> NVL(B.INVALID_DPV_EXCL_IND ,'T')
82 NVL(A.IMB_DLVRY_ZIP_3_EXCL_IND ,'T') <> NVL(B.IMB_DLVRY_ZIP_3_EXCL_IND ,'T')
86 NVL(A.STOP_SCAN_DATETIME , SYSDATE ) <> NVL(B.STOP_SCAN_DATETIME , SYSDATE )
87 NVL(A.NON_MATCH_APPT_EXCL_IND ,'T') <> NVL(B.NON_MATCH_APPT_EXCL_IND ,'T')
88 NVL(A.STOP_THE_CLOCK_EXCL_IND ,'T') <> NVL(B.STOP_THE_CLOCK_EXCL_IND ,'T')
89 NVL(A.ORPHAN_HU_NOT_BMEU_EXCL_IND ,'T') <> NVL(B.ORPHAN_HU_NOT_BMEU_EXCL_IND ,'T')
91 NVL(A.PARS_EXCL_IND ,'T') <> NVL(B.PARS_EXCL_IND ,'T')
92 NVL(A.SV_DUP_CR_EXCL_IND ,'T') <> NVL(B.SV_DUP_CR_EXCL_IND ,'T')
94 NVL(A.FCM_DESTN_EXCL_IND ,'T') <> NVL(B.FCM_DESTN_EXCL_IND ,'T')
95 NVL(A.PC_SCN_BEF_UNLD_SCAN_EXCL_IND ,'T') <> NVL(B.PC_SCN_BEF_UNLD_SCAN_EXCL_IND ,'T')
96 NVL(A.NON_MTCHG_DLVRY_PT_EXCL_IND ,'T') <> NVL(B.NON_MTCHG_DLVRY_PT_EXCL_IND ,'T')
98 NVL(A.MPE_NON_MATCH_ZIP_EXCL_IND ,'T') <> NVL(B.MPE_NON_MATCH_ZIP_EXCL_IND ,'T')
99 NVL(A.INVALID_ORGN_ZIP5_EXCL_IND ,'T') <> NVL(B.INVALID_ORGN_ZIP5_EXCL_IND ,'T')
102 NVL(A.FS_IMD_NESTING_EXCL_IND ,'T') <> NVL(B.FS_IMD_NESTING_EXCL_IND ,'T')
104 NVL(A.APPT_IRREG_EXCL_IND ,'T') <> NVL(B.APPT_IRREG_EXCL_IND ,'T')
105 NVL(A.EPFED_UPDT_EXCL_IND ,'T') <> NVL(B.EPFED_UPDT_EXCL_IND ,'T')
107 NVL(A.INVALID_STC_DATE_EXCL_IND ,'T') <> NVL(B.INVALID_STC_DATE_EXCL_IND ,'T')
108 NVL(A.OTHER_EXCL_IND ,'T') <> NVL(B.OTHER_EXCL_IND ,'T')
110 A.NON_MATCH_FAC_EXCL_IND <> B.NON_MATCH_FAC_EXCL_IND
113 NVL(A.START_THE_CLOCK_EXCL_IND,'T') <> NVL(B.START_THE_CLOCK_EXCL_IND,'T')
199 NVL(A.EXCL_STS_CODE,-7) <> NVL(B.EXCL_STS_CODE,-7)
200 NVL(A.CERTIFIED_MAILER_IND ,'T') <> NVL(B.CERTIFIED_MAILER_IND ,'T')
300 NVL(A.FIRST_SCAN_DATETIME,SYSDATE) <> NVL(B.FIRST_SCAN_DATETIME,SYSDATE)
301 NVL(A.SV_SCAN_IND,'T') <> NVL(B.SV_SCAN_IND,'T')
302 NVL(A.MAILG_DATE , SYSDATE ) <> NVL(B.MAILG_DATE , SYSDATE )
304 NVL(A.PLANT_DU_FAIL_RSN_CODE ,'T') <> NVL(B.PLANT_DU_FAIL_RSN_CODE ,'T')
305 NVL(A.LAST_SCAN_OP_CODE ,'T') <> NVL(B.LAST_SCAN_OP_CODE ,'T')
306 NVL(A.PART_ONE_DAY_CNT , -1 ) <> NVL(B.PART_ONE_DAY_CNT , -1 )
307 NVL(A.FEDEX_AIR_SCAN_IND ,'T') <> NVL(B.FEDEX_AIR_SCAN_IND ,'T')
308 NVL(A.FIRST_SCAN_MACH_TYPE ,'T') <> NVL(B.FIRST_SCAN_MACH_TYPE ,'T')
309 NVL(A.RECORD_TYPE_CODE , -1 ) <> NVL(B.RECORD_TYPE_CODE , -1 )
310 NVL(A.LPO_TYPE_CODE , -1 ) <> NVL(B.LPO_TYPE_CODE , -1 )
311 NVL(A.LAST_OFD_SCAN_DATETIME , SYSDATE ) <> NVL(B.LAST_OFD_SCAN_DATETIME , SYSDATE )
312 NVL(A.END_TO_END_TRANSIT_HRS_ADJ , -1 ) <> NVL(B.END_TO_END_TRANSIT_HRS_ADJ , -1 )
313 NVL(A.STOP_SCAN_IND ,'T') <> NVL(B.STOP_SCAN_IND ,'T')
314 NVL(A.STOP_SCAN_OP_CODE ,'T') <> NVL(B.STOP_SCAN_OP_CODE ,'T')
315 NVL(A.RATE_TYPE_CODE , -1 ) <> NVL(B.RATE_TYPE_CODE , -1 )
316 NVL(A.THS_CLEAN_AIR_SCAN_IND ,'T') <> NVL(B.THS_CLEAN_AIR_SCAN_IND ,'T')
317 NVL(A.CAIR_SCAN_IND ,'T') <> NVL(B.CAIR_SCAN_IND ,'T')
318 NVL(A.STOP_SCAN_FAC_ZIP_CODE ,'T') <> NVL(B.STOP_SCAN_FAC_ZIP_CODE ,'T')
319 NVL(A.END_TO_END_TRANSIT_HRS , -1 ) <> NVL(B.END_TO_END_TRANSIT_HRS , -1 )
320 NVL(A.IMB_DLVRY_ZIP_7 ,'T') <> NVL(B.IMB_DLVRY_ZIP_7 ,'T')
321 NVL(A.LAST_SCAN_FAC_ZIP_CODE ,'T') <> NVL(B.LAST_SCAN_FAC_ZIP_CODE ,'T')
322 NVL(A.PRDCL_COUNTY_IND ,'T') <> NVL(B.PRDCL_COUNTY_IND ,'T')
323 NVL(A.IMB_DLVRY_ZIP_5 ,'T') <> NVL(B.IMB_DLVRY_ZIP_5 ,'T')
324 NVL(A.COMAIL_IND ,'T') <> NVL(B.COMAIL_IND ,'T')
325 NVL(A.RATE_CAT_CODE,-1) <> NVL(B.RATE_CAT_CODE,-1)
326 NVL(A.FIRST_OFD_SCAN_DATETIME,SYSDATE)<>NVL(B.FIRST_OFD_SCAN_DATETIME,SYSDATE)
327 A.HU_LVL_CODE IS NULL AND B.HU_LVL_CODE IS NOT NULL
328 NVL(A.FSS_SCAN_IND ,'T') <> NVL(B.FSS_SCAN_IND ,'T')
329 NVL(A.RANGE_IND ,'T') <> NVL(B.RANGE_IND ,'T')
330 NVL(A.IMTB_CODE ,'T') <> NVL(B.IMTB_CODE ,'T')
331 NVL(A.FIRST_MILE_HOURS , -1 ) <> NVL(B.FIRST_MILE_HOURS , -1 )
332 NVL(A.LAST_SCAN_MACH_TYPE,-1)<>NVL(B.LAST_SCAN_MACH_TYPE,-1)
333 NVL(A.HU_LVL_CODE,-1)<>NVL(B.HU_LVL_CODE,-1)
334 NVL(A.HU_TYPE_CODE , -1 ) <> NVL(B.HU_TYPE_CODE , -1 )
335 NVL(A.HU_LBL_CIN_CODE ,'T') <> NVL(B.HU_LBL_CIN_CODE ,'T')
336 NVL(A.CONTR_LVL_CODE,-1) <> NVL(B.CONTR_LVL_CODE,-1)
337 NVL(A.PREP_TYPE_CODE,-1) <> NVL(B.PREP_TYPE_CODE,-1)
113 rows selected.
Strategy 1: rewrite to an explicit SQL pass thru query. Perhaps Oracle can optimize this better, or have a better connection between the two servers.
Strategy 2: Since you have what it seems a narrow filter on iv_ora.ods_iv_recon_selected_mp, save a copy if that first. Then see if you can use that to either upload that as a temp table to
IBSCRPRD or use use DBKEY. If your &rule also is narrow, you might want to download
bids_ora.ods_bi_recon_selected_mp filtered to SAS, and to the join explicitly in SAS.
Some questions. Why do you have two separate databases with linking information? That doesn't sound like the best setup to start with. Secondly, can you not setup some views on the database side to show the data you want with the rules, that would seem more efficient, then you just query the results of the view. Finally, back to the multiple databases again, can the zip not be added to the one dataset, that would remove the whole joining part completely.
Would Views really help with efficiency? Wouldn't Oracle just be running a similar join every time I referenced the View?
Both datasets do contain the imb_dlvry_zip_5 column also. Would adding the imb_dlvry_zip_5 to the join speed things up? Like:
where a.imb_dlvry_zip_5 = '14221'
and a.imb_dlvry_zip_5 = b.imb_dlvry_zip_5
and a.imb_code = b.imb_code
AND &rule;
To clarify, the tables are on two different databases because they are samples of larger tables that originate on two separate databases to begin with. One database is not completely controlled by my company and they will no allow a database link to be placed between the databases. Without a database link I don't believe I can recode this using SQL Pass-thru either.
If that is the case, then I you would be better off extracting each database to a SAS dataset as @Reeza has mentioned, and then running your secondary queries and merges on your system. I would imagine the bottleneck is the pulling of data form two sources and merging it using a third system. So pulling the data locally and then just using one sytem to do the processing should reduce the overhead.
proc sql; create table ods_iv_recon_selected_mp as select * from ...; create table ...; quit; /* Disconnect from databases */ /* Do processing of data here */
Ok, if I go about this by extracting each database table to a SAS dataset does it make sense to also index the SAS-side datasets? Would this help in performance also since I have no choice but to perform this query for 100+ different rules?
It may do, sorry, can't really be much more help there, just try it. I would add to my original part:
proc sql; create table ods_iv_recon_selected_mp as select * from ...; create table ...; quit; /* Disconnect from databases */
proc sql;
create joined_data as
...
where a.imb_dlvry_zip_5 = '14221'
and a.imb_code = b.imb_code;
quit;
/* Do processing of data here */
Then you have a dataset which fulfils that criteria which you use in all the other tests. That can then be your source, and on that one dataset you can do your tests. So data from one DB to SAS, data from other DB to SAS, SAS combines the two to create master dataset. Then you checks run over master dataset with a simple datastep (this will create testX with the result of the where condition, note I have assumed that in the merge for master dataset you will rename the necessary variables):
data tests; test=1; check="ACTUAL_DLVRY_DATE IS NULL AND B_ACTUAL_DLVRY_DATE IS NOT NULL"; output; test=1.5; check="ACTUAL_DLVRY_DATE > B_ACTUAL_DLVRY_DATE"; output; run; data _null_; set tests; call execute(cat('proc sql; create table test',strip(put(test,best.)),' as select * from MASTER_DATASET where ',strip(check))); run;
Thank you everyone for all the ideas. I decided to bring the datasets local to SAS from both Oracle database table and am going to create indexes on each SAS dataset on IMB_Code to address my performance issues. I appreciate the time you took to help me with this.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.