BookmarkSubscribeRSS Feed
yashpande
Obsidian | Level 7

 

Hi,

 

I have below query which is taking lot of time 


proc sql;
create table agent1
as
select a.*,
B.DEAL_BRANCH_CITY,
B.DEAL_BRANCH_KEY,
B.DEAL_BRANCH_NAME,
B.DEAL_BRANCH_STATE,
B.DEAL_PRIMARY_SUBVERTICAL_KEY,
B.DEAL_PRIMARY_VERTICAL_KEY,
B.DEAL_PRI_SSUB_VERTICAL_DESC,
B.DEAL_PRI_VVERTICAL_DESC,
B.DEAL_RM_CODE,
B.DEAL_RM_NAME,
B.DEAL_SEC_RM_CODE,
B.DEAL_SEC_RM_NAME,
B.DEAL_SEC_SSUB_VERTICAL_DESC,
B.DEAL_SEC_SUBVERTICAL_KEY,
B.DEAL_SEC_VERTICAL_KEY,
B.DEAL_SEC_VVERTICAL_DESC,
B.POL_DEAL_DSA_NAME,
B.POL_DEAL_NUM
FROM
work.agent as a
LEFT JOIN
STAGE.master_LOOKUP AS B
ON
UPCASE(A.POLICY_NUMBER)=UPCASE(B.POL_NUM)
;
quit;

 

 

Here, Agent table has 8731382 records and no duplicates It has 36 variables.  My B table is pain it has 167171496 and has duplicates on the key column and has total 29 columns. Both of these tables are SAS tables and have indexes on respective columns. I tried with HASH join but it still takes huge time.

 

How can I reduce the execution time for this ? Any suggestion is highly appreciated

 

 

5 REPLIES 5
KachiM
Rhodochrosite | Level 12
 

Here, Agent table has 8731382 records and no duplicates It has 36 variables.  My B table is pain it has 167171496 and has duplicates on the key column and has total 29 columns. Both of these tables are SAS tables and have indexes on respective columns. I tried with HASH join but it still takes huge time.

 

Do you want the output to have 36 + 167 variables? What is length of POLICY_NUMBER  ? From the use of UPCASE() I understand that POLICY_Number is not digital string. 

yashpande
Obsidian | Level 7

I want 36 columns from Agent table and 18 columns from B table. Length of POLICY_NUMBER is 200. 

Kurt_Bremser
Super User

@yashpande wrote:

Length of POLICY_NUMBER is 200. 


Check if that is actually the case in your original source data. I work for an insurance company, and our policy number column is just 10 bytes.

Since 200 is the default length that SAS uses in unclear cases, you might accidentally have introduced a mistake there by not setting a length explicitly when using a character function..

KachiM
Rhodochrosite | Level 12

@yashpande 

 

 

I consider loading data from Agent as described below into a hash table seems promising. @Kurt_Bremser suggested other ways including Datastep Merge. Merge requires sorting of both the data sets which in your circumstance is very expensive.

The length of 200 is too large to go into a hash table. If you are sure of 200 bytes for Policy_Number, you can use md5() function to bring it down to 16 bytes to save memory space for KEY-Part. Further, you can use Observation number (Record ID) of Agent as DATA-Part of Hash table. When you find a match with your LOOKUP Data set, you can recover your 36 columns of Agent by using POINT= option of Set statement.

It is a challenging problem.

Kurt_Bremser
Super User

Your indexes do not help because of the use of functions in the join condition. Indexes work with the raw values of their columns.

Things to do:

  • create unambiguous columns for the keys (e.g. all uppercase)
  • sort by (or create indexes for) these columns

Then you can rerun your join, using either method you already tried.

Since you have a one-to-many join, you can use a data step merge if both tables are sorted accordingly.

 

Alternatively, use the table without duplicates in a hash. Bringing the keys to all uppercase will also be necessary for this.

 

Keep in mind that reading 167 million observation as such is not done in seconds. What kind of library is STAGE? If it is a connection to a remote database, this may be your biggest bottleneck.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1271 views
  • 0 likes
  • 3 in conversation