SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How DI studio handles concurrent db table updates

Reply
Frequent Contributor
Posts: 82

How DI studio handles concurrent db table updates

In SAS DI studio, if an ETL job is running and wriitng data to a target table and another ETL job tries to write to the same table (in a separate session), what will happen?

1.Both jobs will fail (lock each other out)?

2.First job will succeed and second will fail?

3.Data will become corrupted in some way

4.Both jobs will succeed and the second job will overwrite the data written by the first job

Does the answer change based on the type of the target table ( SAS, Oracle, Netezza, SQL server)

Super User
Posts: 5,260

Re: How DI studio handles concurrent db table updates

This is not really a DI Studio, rather an engine question.

Generally, Base engine does not allow for concurrent rad/updates (unless when operating via a SAS/SHARE server).

Most external RDBMS allows concurrent reads/updates, you may need to check for you DBA what kind of configuration that have been made for your schema.

If there is a conflict, and there is member lock involved, situation 2 applies. In rare cases data could be corrupt (depending on the update method used).

Data never sleeps
SAS Employee
Posts: 203

Re: How DI studio handles concurrent db table updates

Linus is right, this is a question about SAS/ACCESS engines and what kind of connections have been made.

This is an expansive topic. It would be impossible to cover it all in a post. That being said ...

The answer does depend on the data source. For example, SAS data sets have no provisions for simultaneous update (really transactions). You would have to place the SAS data set under the control of SAS Share for that. All the databases you mention have provisions for simultaneous update, I think.

It also matters what is being done. For example, in many database systems if I am running a load utility the table is locked. In that case any job that tries to write will fail (or perhaps sit there a while). DB2 has mechanisms to allow reading during loads, but many others do not.

If both jobs are doing updates, they probably will run correctly but could hang due to a deadlock. There is always a possibility that the "other" job will UPDATE/DELETE a row so you don't get the results that you expect. This isn't corruption, it's the way it works. If read consistency is important you can lock the table using READ_LOCK_TYPE= SAS LIBNAME statement option. It is best to not do this; usually dirty reads (reading uncommitted data) is fine. You have to be careful with locking. If a table is locke, the second job will sit and wait until the lock is released or the wait threshold is reached. If the threshold is reached the job will fail. Taking unnecessary locks is a great way to get your DBA to visit you.

To know the exact behavior you need to know your database and what you are asking it to do. The SAS/ACCESS Reference is a great place to start. There are LIBNAME statement options that enable you to control this behavior. Also, talk to your DBA regarding your intent. They will help you out.

Here is a link to the SAS 9.4 SAS/ACCESS reference.

http://support.sas.com/documentation/cdl/en/acreldb/66787/PDF/default/acreldb.pdf

Respected Advisor
Posts: 3,900

Re: How DI studio handles concurrent db table updates

Just to add one piece to the puzzle:

There is now a new option "FILELOCKWAIT" available since SAS 9.4 which will help to avoid file locking issues using the Base SAS engine and having multiple jobs writing to the same SAS file.

http://support.sas.com/documentation/cdl/en/hostwin/67241/HTML/default/viewer.htm#hostwinwhatsnew94....

This is not really concurrent access but I believe is a very useful new option when for example writing to a control table while running jobs in parallel.

Ask a Question
Discussion stats
  • 3 replies
  • 592 views
  • 0 likes
  • 4 in conversation