Solved: Re: How to avoid table locks during bulk load in db2

learn_SAS_23 · Posted 02-04-2022 12:54 PM

Hello Team,

We are using db2 bulk loader (cliload)transformation while loading to our target database,

Every day we receive 10 to 9 million of rows to load into target table, and the target table is partitioned on monthly basis and indexes with date .

Loading usually takes 3 to 4 hours of time, if some other process uses the table in mean time it creates table lock and doesn’t allow the load process,

How to gain much much speed in loading process and how to solve the database locks

Please post your tips and tricks on this issue

Kurt_Bremser · Posted 02-07-2022 11:54 AM

Using Bulk Load to create/populate the staging table is the right way to go, but from there all processing has to be done in the DB, and that means explicit pass-through, or an SQL job done exclusively from the same scheduler that initiated the SAS job.

Just to give you an alternate perspective: we create flat files from SAS, let the data center people (who control the main scheduler) upload the files to the DB server, and there run an insert that loads the data into the DB. Our SAS has (and does not need) any direct access to the DB.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 02-04-2022 03:36 PM

Do you load from DB/2 to SAS, or the other way round? Which operating system does your SAS run on? How are both servers connected?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

learn_SAS_23 · Posted 02-07-2022 02:19 AM

am loading from DB2(stage table to target table) , both are in same unix server

Kurt_Bremser · Posted 02-07-2022 07:21 AM

Since you already have the data in the DBMS, use explicit pass-through in a user written transformation. Otherwise you run the risk of SAS loading the data back from the DBMS for the bulk load, causing double, unnecessary network traffic.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Patrick · Posted 02-05-2022 06:16 PM

This sounds more like a DB2 than a SAS problem. If there is another process issuing a table lock then I guess it's nothing you can change. If it's your process requiring an exclusive table lock and some read process prevents this then there might be options.

What about loading the data first from SAS into a staging table in DB2 (=empty table structure with identical columns but with no partitioning, constraints or indexes).

Then load the data within DB2 from the staging into the target table (using explicit pass-through SQL). Because this process is now fully within DB2 it will also be much easier to get support from a DB2 admin/expert.

It's certainly possible to insert into a table without the need of an exclusive table lock but which loading approach to chose will depend on your target table (i.e. are there also referential integrity, triggers etc. on this table).

learn_SAS_23 · Posted 02-07-2022 02:30 AM

Thanks Patrick ,

Yes, we are using similar way of loading in to empty stage DB2 table and then loading to target table using sas DIS DB2 bulk loader transformation (CliLoad) . As target table already had huge amount of data , we are appending new rows to existing table.

Can you post bit more details which options to use in DB2 loader transformation to avoid table locks and allow other process to read existing data

Patrick · Posted 02-07-2022 07:17 AM

Hmmm.... The only way I've ever used bulk load and DIS transformations allowing to bulk load were from external data source (i.e. a SAS table) into the target DBMS table. I never even considered to bulk-load from one DBMS table to another within the same DB - and I'm also not sure if this will be faster and especially if SAS generated code would execute such a process fully in-database or eventually first pull the source table to the SAS side and only then bulk-load from the SAS side to the target table.

What I had in mind is to first bulk-load your SAS table into an empty DBMS staging table to which you can have exclusive access and then as a 2nd step run an Insert from the staging to the target table - and there of course turning on parallelism, loading directly into a partition, commit once, control when indexes and statistics get updated,... or whatever else can speed-up the load process.

If you can't get the 2nd step - the Insert - to perform as you need then here you've got both source and target already in DB2 which allows you to "take SAS out of the picture" and get support from a DB2 expert. Any code that can be executed out of a DB2 client can then also get executed via SAS (in a Proc SQL Execute block as explicit pass-through SQL that just gets sent "as is" to the DB for execution).

In DIS this 2nd step would then likely become user written code or eventually a custom transformation.

I haven't worked with DB2 for many years so can't help much more with details. Here some docu links - the SAS options for CLI LOAD will be related in name to the parameters available within the DB2 utilities.

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acreldb/p1w9ial7770y6tn1rtfiuwtfg088.htm

https://www.ibm.com/docs/en/db2/11.5?topic=data-importing-cli-load

https://www.ibm.com/docs/en/db2/11.5?topic=load-options-improving-performance

3-4 hours to load 10M rows feels awfully long. It depends of course on the row size of your source table and how the target table looks like (i.e. foreign key constraints can slow down things a lot).

learn_SAS_23 · Posted 02-07-2022 10:19 AM

Thanks Patrick and Kurt for your valuable suggestions and link references , they are insightful .
As Patrick noted that the table is huge with close to 60 columns and 11 foreign keys which slows down the process. we are using Bulkload from sas dataset to stage DB2 table and again bulk loader from stage DB2 table to target DB2 table. i have to try with inserting data with in DB2 and we are also checking with DB2 DBA to get some tips .

Kurt_Bremser · Posted 02-07-2022 11:54 AM

Using Bulk Load to create/populate the staging table is the right way to go, but from there all processing has to be done in the DB, and that means explicit pass-through, or an SQL job done exclusively from the same scheduler that initiated the SAS job.

Just to give you an alternate perspective: we create flat files from SAS, let the data center people (who control the main scheduler) upload the files to the DB server, and there run an insert that loads the data into the DB. Our SAS has (and does not need) any direct access to the DB.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

learn_SAS_23 · Posted 02-08-2022 03:45 AM

Thanks Kurt for the detailed information,
one of DB2 expert suggested similar way of handling this situation , means creating a flat files from sas with delta data and upload to DB2 table through the INGEST method (which doesn't create any table locks allows the user to read committed data ,while table is loaded with delta data)

Kurt_Bremser · Posted 02-08-2022 03:58 AM

All our DB to SAS communication is done with flat files.

the text files are kept in archive for future reference and always show the state of the DB when the snapshot for the unload was made
text files are always readable, on every platform
the DB admins keep full control over the time when the DB system will have to deal with the stress of bulk unload/insert
we save the cost of the ACCESS license
we do not need to keep a DB client on the SAS server (needs additional maintenance)

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Classroom Training Available!