Let’s say I have a SAS dataset with 40 million records and I need to load this to a ORACLE dimension table. Is there a way I can split this file of 40 million into say 20 million each and process them in parallel , so that load time to ORA dimension will be reduced? This is like how we use DBSLICE when reading from ORA or RDBMS systems , but while writing to ORA ,can we do some parallel processing in the same dimension load?
The fastest way to write large amount of data to Oracle is by using bulk-load. I assume that you already have looked into that. I don't whether this done in parallel in Oracle, maybe the indexes are created in parallel to table load. So from my point of view, this might be more of a Oracle DBA question. Maybe you want to look into to do parallel processing by loading different tables in parallel, in that way you will probably shorten the total run-time.
Thanks Linus for your inputs! This is really helpful!
I went through the document you mentioned.
Looks like parallel load is only possible in the case of bulk load.
Let me briefly explain the scenario I’m looking at.
I have a job in DI studio which is as follows –
SAS datasest (40 M records) --> SQL join transform ---> Tempobject --> Lookup transform ---> Tempobject ---> SQL join transform –> Tempobject ---> Table Loader(update/Insert Technique) --> ORA DIMENSION table
So this is ideally not a case of bulk insert. Do you know in such cases any parallel processing that can be done. By parallel processing I’m looking at splitting the 40 million source file into say 20 million each and then loading them in parallel to ORA.
I’m doubtful whether this is possible. Any thoughts?
One option is to load the data into a target SAS dataset and then use bulk load to ORACLE table. I’m trying to see if there are ways to avoid this as I do not want to keep a replica SAS dataset of the ORA table.
I guess that your 40'' records are potential updates/inserts. So the question is how many of them becomes updates/inserts in your load step in a typical load?
If a major part is inserts, I still think that bulk-load is an alternative for the inserts. Carefully go through the log to find out where the bottle necks of your load step are. There are many things going on in the table loader...