SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

how to improve update performance

Reply
Contributor
Posts: 61

how to improve update performance

hi all,

I am working with sas data management studio.In that i was using update node in one of my job.My target node is having 65 lakh records and my transaction is having 74000 records. when i execute update node with following query

update master a

set a.action_status=( select b.action_status from trans b where a.cust_id=b.cust_id)

when i ran the update node it is running for 1 1/2 day...and it is still running for the second day also..

can anyone help me on this and also share your views to increase the performance..

MANY THANKS IN ADVANCE.

Super User
Posts: 10,046

Re: how to improve update performance

Posted in reply to sandeep_reddy

Hash Table.

SAS Employee
Posts: 85

Re: how to improve update performance

Posted in reply to sandeep_reddy

Index the fields you are joining and the action_status field.

Occasional Contributor
Posts: 13

Re: how to improve update performance

Posted in reply to sandeep_reddy

Hi,

I don't have SAS Data Management Studio but in from your code I have one idea what could be the problem:

your update is making select from table b for every row in table a which means 6,5 million times a select from (74000 rows inner join with 6,5 million rows) with a answer of 1 action status.

this can't be fast!

However you could make a filter to limit the rows in a big table like this

update master a

set a.action_status=(select b.action_status from trans b where a.cust_id=b.cus_id)

where a.cust_id IN (select DISTINCT cust_id from trans)

or you rewrite it to be a data step:

data master;

set master trans;

BY cust_id;

run;

However, the second option had a need that data are sorted by cust_id. If you build an index by cust_id this could help also.

Just few thoughts

Trusted Advisor
Posts: 3,215

Re: how to improve update performance

Posted in reply to sandeep_reddy

Many attention points to check and evaluate as there is too little background information. 

I = ksharp, although this will need a special user-transformation for DI

      The hash approach could be extended to a datastep vs SQL approach 

II = skillman optimizing datasets within SQL behavior

III = Change some SAS system options for better performance

More?

---->-- ja karman --<-----
SAS Employee
Posts: 85

Re: how to improve update performance

Posted in reply to sandeep_reddy

Are you using the Data Update node or the SQL Execute node? If you are not passing data to the data update node, and using database tables to determine the records to update (which it looks like you are doing based on the information provided) then I would use the SQL Execute node to push this update to the database and take advantage of it's memory/cpu.

Ask a Question
Discussion stats
  • 5 replies
  • 528 views
  • 0 likes
  • 5 in conversation