DATA Step, Macro, Functions and more

SAS Efficiency Help Needed

Accepted Solution Solved
Reply
Regular Contributor
Posts: 212
Accepted Solution

SAS Efficiency Help Needed

[ Edited ]

 

I have the following SAS code that references and joins two large Oracle tables via Oracle libnames (~500 million records per table). This query takes hours and repeats with varying &rule conditions, so the whole program runs for over a day. Are there any strategies I can employ to make the query run faster?

 

libname iv_ora   oracle user=&orauser pass=&orapass path="IVASPRD" schema="IVAS";
libname bids_ora oracle user=exfcread pass="mypw" path="IBSCRPRD" schema="IMAPSSCR";

proc sql;
	create table QueryData as 
		select  b.actual_dlvry_date as AD_DT, 
				b.imb_code length = 31, 
				b.spm_calc_batch_date
	    from iv_ora.ods_iv_recon_selected_mp a,  bids_ora.ods_bi_recon_selected_mp b                                                                                                                                                                                                                                                                                                                                                                                                                                                   
		where a.imb_dlvry_zip_5 = '14221'
		   and a.imb_code = b.imb_code AND &rule;
quit;

Accepted Solutions
Solution
‎06-06-2016 10:52 AM
Super User
Posts: 17,750

Re: SAS Efficiency Help Needed

When your working with two different server SAS has to first import the data and then do the processing. If possible you may want to find a way to minimize the querying of multiple tables. 

 

What do the rules look like? Without an example it's hard to comment beyond this. 

 

 

View solution in original post


All Replies
Solution
‎06-06-2016 10:52 AM
Super User
Posts: 17,750

Re: SAS Efficiency Help Needed

When your working with two different server SAS has to first import the data and then do the processing. If possible you may want to find a way to minimize the querying of multiple tables. 

 

What do the rules look like? Without an example it's hard to comment beyond this. 

 

 

Regular Contributor
Posts: 212

Re: SAS Efficiency Help Needed

[ Edited ]

The rules look like this...

 

1                                A.ACTUAL_DLVRY_DATE IS NULL AND B.ACTUAL_DLVRY_DATE IS NOT NULL
1.5                              A.ACTUAL_DLVRY_DATE > B.ACTUAL_DLVRY_DATE
2                                NVL(A.ACTUAL_DLVRY_DATE,SYSDATE) <> NVL(B.ACTUAL_DLVRY_DATE,SYSDATE)
3                                A.LAST_SCAN_DATETIME < B.LAST_SCAN_DATETIME
3.1                              A.LAST_SCAN_DATETIME > B.LAST_SCAN_DATETIME
3.2                              A.LAST_SCAN_DATETIME IS NOT NULL AND B.LAST_SCAN_DATETIME IS NULL
3.3                              A.LAST_SCAN_DATETIME IS NULL AND B.LAST_SCAN_DATETIME IS NOT NULL
3.5                              A.ACTUAL_ENTRY_DATETIME IS NULL AND B.ACTUAL_ENTRY_DATETIME IS NOT NULL
4                                NVL(A.ACTUAL_ENTRY_DATETIME,SYSDATE) <> NVL(B.ACTUAL_ENTRY_DATETIME,SYSDATE)
6                                NVL(A.CRITICAL_ENTRY_TIME,SYSDATE) <> NVL(B.CRITICAL_ENTRY_TIME,SYSDATE)
10                               A.INDCTN_MTHD IS NULL AND B.INDCTN_MTHD IS NOT NULL
11                               NVL(A.INDCTN_MTHD,-1)<>NVL(B.INDCTN_MTHD,-1)
15                               NVL(A.EXPECTED_DLVRY_DATE , SYSDATE ) <> NVL(B.EXPECTED_DLVRY_DATE , SYSDATE )
19                               NVL(A.SVC_STD_ADJ , -1 ) <> NVL(B.SVC_STD_ADJ , -1 )
20                               NVL(A.EPFED_FAC_TYPE_CODE , -1 ) <> NVL(B.EPFED_FAC_TYPE_CODE , -1 )
21                               NVL(A.SVC_VARIANCE , -1 ) <> NVL(B.SVC_VARIANCE , -1 )
22                               NVL(A.ML_CL_CODE , -1 ) <> NVL(B.ML_CL_CODE , -1 )
25                               NVL(A.START_THE_CLOCK_DATE , SYSDATE ) <> NVL(B.START_THE_CLOCK_DATE , SYSDATE )
26                               NVL(A.CLEARANCE_TIME ,'T') <> NVL(B.CLEARANCE_TIME ,'T')
27                               NVL(A.STOP_THE_CLOCK_DATE , SYSDATE ) <> NVL(B.STOP_THE_CLOCK_DATE , SYSDATE )
28                               NVL(A.SVC_STD , -1 ) <> NVL(B.SVC_STD , -1 )
29                               NVL(A.INTER_SCF_EXCL_IND ,'T') <> NVL(B.INTER_SCF_EXCL_IND ,'T')
30                               NVL(A.SUPPL_CONTR_EXCL_IND ,'T') <> NVL(B.SUPPL_CONTR_EXCL_IND ,'T')
34                               NVL(A.SCHD_SHIP_DATE_EXCL_IND ,'T') <> NVL(B.SCHD_SHIP_DATE_EXCL_IND ,'T')
37                               NVL(A.SATURDAY_HOLD_IND ,'T') <> NVL(B.SATURDAY_HOLD_IND ,'T')
38                               NVL(A.ORGN_FAC_ZIP_5,'T') <> NVL(B.ORGN_FAC_ZIP_5,'T')
39                               NVL(A.NON_UNIQ_IMCB_EXCL_IND ,'T') <> NVL(B.NON_UNIQ_IMCB_EXCL_IND ,'T')
40                               NVL(A.STOP_SCAN_BEF_AET_EXCL_IND ,'T') <> NVL(B.STOP_SCAN_BEF_AET_EXCL_IND ,'T')
41                               NVL(A.CONTR_IRREG_EXCL_IND ,'T') <> NVL(B.CONTR_IRREG_EXCL_IND ,'T')
42                               NVL(A.NON_UNIQ_IMTB_EXCL_IND,'T') <> NVL(B.NON_UNIQ_IMTB_EXCL_IND,'T')
43                               NVL(A.BMEU_PC_SCAN_BEF_AET_EXCL_IND ,'T') <> NVL(B.BMEU_PC_SCAN_BEF_AET_EXCL_IND ,'T')
44                               NVL(A.NON_UNIQ_IMB_EXCL_IND ,'T') <> NVL(B.NON_UNIQ_IMB_EXCL_IND ,'T')
45                               NVL(A.UNKNOWN_FAC_EXCL_IND ,'T') <> NVL(B.UNKNOWN_FAC_EXCL_IND ,'T')
48                               NVL(A.LONG_HAUL_EXCL_IND,'T') <> NVL(B.LONG_HAUL_EXCL_IND,'T')
51                               NVL(A.IMB_CODE ,'T') <> NVL(B.IMB_CODE ,'T')
52                               NVL(A.DFLT_IMTB_EXCL_IND ,'T') <> NVL(B.DFLT_IMTB_EXCL_IND ,'T')
53                               NVL(A.ML_DRCTN_EXCL_IND ,'T') <> NVL(B.ML_DRCTN_EXCL_IND ,'T')
54                               NVL(A.INVLD_CTR_LVL_ENT_FAC_EXCL_IND ,'T') <> NVL(B.INVLD_CTR_LVL_ENT_FAC_EXCL_IND ,'T')
56                               NVL(A.ACS_NIXIE_EXCL_IND ,'T') <> NVL(B.ACS_NIXIE_EXCL_IND ,'T')
57                               NVL(A.PBV_EXCL_IND ,'T') <> NVL(B.PBV_EXCL_IND ,'T')
58                               NVL(A.UNLD_STRT_BEF_ACTL_APPT_IND ,'T') <> NVL(B.UNLD_STRT_BEF_ACTL_APPT_IND ,'T')
60                               NVL(A.LAST_SCAN_DV_EXCL_IND ,'T') <> NVL(B.LAST_SCAN_DV_EXCL_IND ,'T')
61                               NVL(A.ACS_COA_EXCL_IND ,'T') <> NVL(B.ACS_COA_EXCL_IND ,'T')
62                               NVL(A.INVALID_ORGN_ZIP3_EXCL_IND ,'T') <> NVL(B.INVALID_ORGN_ZIP3_EXCL_IND ,'T')
63                               NVL(A.IMB_DLVRY_ZIP_3 ,'T') <> NVL(B.IMB_DLVRY_ZIP_3 ,'T')
65                               NVL(A.NON_MTCHG_SCAN_APPT_EXCL_IND ,'T') <> NVL(B.NON_MTCHG_SCAN_APPT_EXCL_IND ,'T')
67                               NVL(A.NO_SVC_STD_EXCL_IND ,'T') <> NVL(B.NO_SVC_STD_EXCL_IND ,'T')
68                               NVL(A.INVALID_ENTRY_FAC_EXCL_IND,'T')<>NVL(B.INVALID_ENTRY_FAC_EXCL_IND,'T')
69                               NVL(A.INVLD_FNL_PRCS_ZIP5_EXCL_IND ,'T') <> NVL(B.INVLD_FNL_PRCS_ZIP5_EXCL_IND ,'T')
74                               NVL(A.NULL_STC_DV_EXCL_IND ,'T') <> NVL(B.NULL_STC_DV_EXCL_IND ,'T')
77                               NVL(A.PW_RGE_CNT_EXCL_IND ,'T') <> NVL(B.PW_RGE_CNT_EXCL_IND ,'T')
78                               NVL(A.HIGH_DLVRY_EXCL_IND ,'T') <> NVL(B.HIGH_DLVRY_EXCL_IND ,'T')
79                               NVL(A.ML_CAT_CODE , -1 ) <> NVL(B.ML_CAT_CODE , -1 )
80                               NVL(A.STOP_SCAN_BEF_STC_EXCL_IND ,'T') <> NVL(B.STOP_SCAN_BEF_STC_EXCL_IND ,'T')
81                               NVL(A.INVALID_DPV_EXCL_IND ,'T') <> NVL(B.INVALID_DPV_EXCL_IND ,'T')
82                               NVL(A.IMB_DLVRY_ZIP_3_EXCL_IND ,'T') <> NVL(B.IMB_DLVRY_ZIP_3_EXCL_IND ,'T')
86                               NVL(A.STOP_SCAN_DATETIME , SYSDATE ) <> NVL(B.STOP_SCAN_DATETIME , SYSDATE )
87                               NVL(A.NON_MATCH_APPT_EXCL_IND ,'T') <> NVL(B.NON_MATCH_APPT_EXCL_IND ,'T')
88                               NVL(A.STOP_THE_CLOCK_EXCL_IND ,'T') <> NVL(B.STOP_THE_CLOCK_EXCL_IND ,'T')
89                               NVL(A.ORPHAN_HU_NOT_BMEU_EXCL_IND ,'T') <> NVL(B.ORPHAN_HU_NOT_BMEU_EXCL_IND ,'T')
91                               NVL(A.PARS_EXCL_IND ,'T') <> NVL(B.PARS_EXCL_IND ,'T')
92                               NVL(A.SV_DUP_CR_EXCL_IND ,'T') <> NVL(B.SV_DUP_CR_EXCL_IND ,'T')
94                               NVL(A.FCM_DESTN_EXCL_IND ,'T') <> NVL(B.FCM_DESTN_EXCL_IND ,'T')
95                               NVL(A.PC_SCN_BEF_UNLD_SCAN_EXCL_IND ,'T') <> NVL(B.PC_SCN_BEF_UNLD_SCAN_EXCL_IND ,'T')
96                               NVL(A.NON_MTCHG_DLVRY_PT_EXCL_IND ,'T') <> NVL(B.NON_MTCHG_DLVRY_PT_EXCL_IND ,'T')
98                               NVL(A.MPE_NON_MATCH_ZIP_EXCL_IND ,'T') <> NVL(B.MPE_NON_MATCH_ZIP_EXCL_IND ,'T')
99                               NVL(A.INVALID_ORGN_ZIP5_EXCL_IND ,'T') <> NVL(B.INVALID_ORGN_ZIP5_EXCL_IND ,'T')
102                              NVL(A.FS_IMD_NESTING_EXCL_IND ,'T') <> NVL(B.FS_IMD_NESTING_EXCL_IND ,'T')
104                              NVL(A.APPT_IRREG_EXCL_IND ,'T') <> NVL(B.APPT_IRREG_EXCL_IND ,'T')
105                              NVL(A.EPFED_UPDT_EXCL_IND ,'T') <> NVL(B.EPFED_UPDT_EXCL_IND ,'T')
107                              NVL(A.INVALID_STC_DATE_EXCL_IND ,'T') <> NVL(B.INVALID_STC_DATE_EXCL_IND ,'T')
108                              NVL(A.OTHER_EXCL_IND ,'T') <> NVL(B.OTHER_EXCL_IND ,'T')
110                              A.NON_MATCH_FAC_EXCL_IND <> B.NON_MATCH_FAC_EXCL_IND
113                              NVL(A.START_THE_CLOCK_EXCL_IND,'T') <> NVL(B.START_THE_CLOCK_EXCL_IND,'T')
199                              NVL(A.EXCL_STS_CODE,-7) <> NVL(B.EXCL_STS_CODE,-7)
200                              NVL(A.CERTIFIED_MAILER_IND ,'T') <> NVL(B.CERTIFIED_MAILER_IND ,'T')
300                              NVL(A.FIRST_SCAN_DATETIME,SYSDATE) <> NVL(B.FIRST_SCAN_DATETIME,SYSDATE)
301                              NVL(A.SV_SCAN_IND,'T') <> NVL(B.SV_SCAN_IND,'T')
302                              NVL(A.MAILG_DATE , SYSDATE ) <> NVL(B.MAILG_DATE , SYSDATE )
304                              NVL(A.PLANT_DU_FAIL_RSN_CODE ,'T') <> NVL(B.PLANT_DU_FAIL_RSN_CODE ,'T')
305                              NVL(A.LAST_SCAN_OP_CODE ,'T') <> NVL(B.LAST_SCAN_OP_CODE ,'T')
306                              NVL(A.PART_ONE_DAY_CNT , -1 ) <> NVL(B.PART_ONE_DAY_CNT , -1 )
307                              NVL(A.FEDEX_AIR_SCAN_IND ,'T') <> NVL(B.FEDEX_AIR_SCAN_IND ,'T')
308                              NVL(A.FIRST_SCAN_MACH_TYPE ,'T') <> NVL(B.FIRST_SCAN_MACH_TYPE ,'T')
309                              NVL(A.RECORD_TYPE_CODE , -1 ) <> NVL(B.RECORD_TYPE_CODE , -1 )
310                              NVL(A.LPO_TYPE_CODE , -1 ) <> NVL(B.LPO_TYPE_CODE , -1 )
311                              NVL(A.LAST_OFD_SCAN_DATETIME , SYSDATE ) <> NVL(B.LAST_OFD_SCAN_DATETIME , SYSDATE )
312                              NVL(A.END_TO_END_TRANSIT_HRS_ADJ , -1 ) <> NVL(B.END_TO_END_TRANSIT_HRS_ADJ , -1 )
313                              NVL(A.STOP_SCAN_IND ,'T') <> NVL(B.STOP_SCAN_IND ,'T')
314                              NVL(A.STOP_SCAN_OP_CODE ,'T') <> NVL(B.STOP_SCAN_OP_CODE ,'T')
315                              NVL(A.RATE_TYPE_CODE , -1 ) <> NVL(B.RATE_TYPE_CODE , -1 )
316                              NVL(A.THS_CLEAN_AIR_SCAN_IND ,'T') <> NVL(B.THS_CLEAN_AIR_SCAN_IND ,'T')
317                              NVL(A.CAIR_SCAN_IND ,'T') <> NVL(B.CAIR_SCAN_IND ,'T')
318                              NVL(A.STOP_SCAN_FAC_ZIP_CODE ,'T') <> NVL(B.STOP_SCAN_FAC_ZIP_CODE ,'T')
319                              NVL(A.END_TO_END_TRANSIT_HRS , -1 ) <> NVL(B.END_TO_END_TRANSIT_HRS , -1 )
320                              NVL(A.IMB_DLVRY_ZIP_7 ,'T') <> NVL(B.IMB_DLVRY_ZIP_7 ,'T')
321                              NVL(A.LAST_SCAN_FAC_ZIP_CODE ,'T') <> NVL(B.LAST_SCAN_FAC_ZIP_CODE ,'T')
322                              NVL(A.PRDCL_COUNTY_IND ,'T') <> NVL(B.PRDCL_COUNTY_IND ,'T')
323                              NVL(A.IMB_DLVRY_ZIP_5 ,'T') <> NVL(B.IMB_DLVRY_ZIP_5 ,'T')
324                              NVL(A.COMAIL_IND ,'T') <> NVL(B.COMAIL_IND ,'T')
325                              NVL(A.RATE_CAT_CODE,-1) <> NVL(B.RATE_CAT_CODE,-1)
326                              NVL(A.FIRST_OFD_SCAN_DATETIME,SYSDATE)<>NVL(B.FIRST_OFD_SCAN_DATETIME,SYSDATE)
327                              A.HU_LVL_CODE IS NULL AND B.HU_LVL_CODE IS NOT NULL
328                              NVL(A.FSS_SCAN_IND ,'T') <> NVL(B.FSS_SCAN_IND ,'T')
329                              NVL(A.RANGE_IND ,'T') <> NVL(B.RANGE_IND ,'T')
330                              NVL(A.IMTB_CODE ,'T') <> NVL(B.IMTB_CODE ,'T')
331                              NVL(A.FIRST_MILE_HOURS , -1 ) <> NVL(B.FIRST_MILE_HOURS , -1 )
332                              NVL(A.LAST_SCAN_MACH_TYPE,-1)<>NVL(B.LAST_SCAN_MACH_TYPE,-1)
333                              NVL(A.HU_LVL_CODE,-1)<>NVL(B.HU_LVL_CODE,-1)
334                              NVL(A.HU_TYPE_CODE , -1 ) <> NVL(B.HU_TYPE_CODE , -1 )
335                              NVL(A.HU_LBL_CIN_CODE ,'T') <> NVL(B.HU_LBL_CIN_CODE ,'T')
336                              NVL(A.CONTR_LVL_CODE,-1) <> NVL(B.CONTR_LVL_CODE,-1)
337                              NVL(A.PREP_TYPE_CODE,-1) <> NVL(B.PREP_TYPE_CODE,-1)

113 rows selected.
Super User
Posts: 5,254

Re: SAS Efficiency Help Needed

Strategy 1: rewrite to an explicit SQL pass thru query. Perhaps Oracle can optimize this better, or have a better connection between the two servers.

 

Strategy 2: Since you have what it seems a narrow filter on iv_ora.ods_iv_recon_selected_mp, save a copy if that first. Then see if you can use that to either upload that as a temp table to IBSCRPRD or use use DBKEY. If your &rule also is narrow, you might want to download bids_ora.ods_bi_recon_selected_mp filtered to SAS, and to the join explicitly in SAS.

 

Data never sleeps
Super User
Super User
Posts: 7,392

Re: SAS Efficiency Help Needed

Some questions.  Why do you have two separate databases with linking information?  That doesn't sound like the best setup to start with.  Secondly, can you not setup some views on the database side to show the data you want with the rules, that would seem more efficient, then you just query the results of the view.  Finally, back to the multiple databases again, can the zip not be added to the one dataset, that would remove the whole joining part completely.

Regular Contributor
Posts: 212

Re: SAS Efficiency Help Needed

[ Edited ]

Would Views really help with efficiency? Wouldn't Oracle just be running a similar join every time I referenced the View?

Both datasets do contain the imb_dlvry_zip_5 column also. Would adding the imb_dlvry_zip_5 to the join speed things up? Like:

where a.imb_dlvry_zip_5 = '14221'
and a.imb_dlvry_zip_5 = b.imb_dlvry_zip_5
and a.imb_code = b.imb_code
AND &rule;

Regular Contributor
Posts: 212

Re: SAS Efficiency Help Needed

To clarify, the tables are on two different databases because they are samples of larger tables that originate on two separate databases to begin with.  One database is not completely controlled by my company and they will no allow a database link to be placed between the databases.  Without a database link I don't believe I can recode this using SQL Pass-thru either.

 

 

Super User
Super User
Posts: 7,392

Re: SAS Efficiency Help Needed

If that is the case, then I you would be better off extracting each database to a SAS dataset as @Reeza has mentioned, and then running your secondary queries and merges on your system.  I would imagine the bottleneck is the pulling of data form two sources and merging it using a third system.  So pulling the data locally and then just using one sytem to do the processing should reduce the overhead.

proc sql;
  create table ods_iv_recon_selected_mp as select * from ...;
  create table ...;
quit;

/* Disconnect from databases */
/* Do processing of data here */
Regular Contributor
Posts: 212

Re: SAS Efficiency Help Needed

Ok, if I go about this by extracting each database table to a SAS dataset does it make sense to also index the SAS-side datasets? Would this help in performance also since I have no choice but to perform this query for 100+ different rules?

Super User
Super User
Posts: 7,392

Re: SAS Efficiency Help Needed

It may do, sorry, can't really be much more help there, just try it.  I would add to my original part:

proc sql;
  create table ods_iv_recon_selected_mp as select * from ...;
  create table ...;
quit;

/* Disconnect from databases */

proc sql;
create joined_data as
...
where a.imb_dlvry_zip_5 = '14221'
and a.imb_code = b.imb_code;
quit;
/* Do processing of data here */

Then you have a dataset which fulfils that criteria which you use in all the other tests.  That can then be your source, and on that one dataset you can do your tests.  So data from one DB to SAS, data from other DB to SAS, SAS combines the two to create master dataset.  Then you checks run over master dataset with a simple datastep (this will create testX with the result of the where condition, note I have assumed that in the merge for master dataset you will rename the necessary variables):

data tests;
  test=1; check="ACTUAL_DLVRY_DATE IS NULL AND B_ACTUAL_DLVRY_DATE IS NOT NULL"; output;
  test=1.5; check="ACTUAL_DLVRY_DATE > B_ACTUAL_DLVRY_DATE"; output;
run;
data _null_;
  set tests;
  call execute(cat('proc sql; create table test',strip(put(test,best.)),' as select * from MASTER_DATASET where ',strip(check)));
run;

 

 

Regular Contributor
Posts: 212

Re: SAS Efficiency Help Needed

[ Edited ]

Thank you everyone for all the ideas. I decided to bring the datasets local to SAS from both Oracle database table and am going to create indexes on each SAS dataset on IMB_Code to address my performance issues.  I appreciate the time you took to help me with this.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 404 views
  • 5 likes
  • 4 in conversation