03-30-2016 03:52 AM
Please give us some more information:
- where does the data come from
- how is your database stored (native SAS table, or in a RDBMS)
- do you only get new records or
- do you need to update existing records that are already in your database
- what are the current sizes, and what are your time constraints
- what does your SAS environment look like (operating system, CPU power, storage)
03-30-2016 04:28 AM
So you are updating a table in Oracle from different sources, but use SAS?
Since you only want to add records, proc append will be your best option.
I guess your Oracle resides on a dedicated server, so network performance will be a major influence.
Other factors are record (observation) size. 20 million of 100 bytes each is not so much, 20 million of 5k bytes each is a lot if it has to go over the network.
03-30-2016 06:05 AM
Slowly the picture is getting clearer
First, I'd try to identify the records to update on the Oracle side and perform an unload of specific records that need updating into a SAS dataset. This reduces the amount of data that needs to be transferred out of the Oracle database.
Then run a proc sql with update.
03-30-2016 06:15 AM
After performing a test here with proc sql and update, I strongly recommend using a data step update. Needs more space, but is faster by orders of magnitude.
03-30-2016 07:15 AM
So, your large master table is in SAS!?
First, make sure that you have indexed the PK for that table. If not already there, consider moving to a SPDE library - since it has much more efficient index maintenance.
Then which method is bets, depends on the data, and your preferences.
Already mentioned is using a data step (using modify by).
Another option is to query the master table first (something like key not in(select * from master) - but perhaps it may hard for SAS to optimize that query.
You may need to try some different techniques before settling for a solution.
03-30-2016 03:55 AM
Could you provide more information, i.e. what OS, what software you have available etc. Do you have existing data which needs to be updated - if so look at the datastep update syntax:
If its new data to be added, then use proc append:
Show what it is your doing, if your data is big, then any operation is going to take time - the append doesn't create a new file, just adds rows to the end, so processing and resource wise this is probably the quickest, however it doesn't update any data already existing so depends on your needs.