DATA Step, Macro, Functions and more

load incremental data

Reply
Contributor
Posts: 32

load incremental data

Hi,

 

I want to load incremental data because the size of dataset is so large and it is taking more time to execute.

I want to load only updated records.?

 

Super User
Posts: 6,936

Re: load incremental data

Please give us some more information:

- where does the data come from

- how is your database stored (native SAS table, or in a RDBMS)

- do you only get new records or

- do you need to update existing records that are already in your database

- what are the current sizes, and what are your time constraints

- what does your SAS environment look like (operating system, CPU power, storage)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 32

Re: load incremental data

 

-database is oracle tables from oracle library  and some sas data sets from sas library.

-want only new records.

-data size aprox 2cr.

-working on windows environment 

 

 

Super User
Posts: 6,936

Re: load incremental data

So you are updating a table in Oracle from different sources, but use SAS?

 

Since you only want to add records, proc append will be your best option.

 

I guess your Oracle resides on a dedicated server, so network performance will be a major influence.

Other factors are record (observation) size. 20 million of 100 bytes each is not so much, 20 million of 5k bytes each is a lot if it has to go over the network.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 32

Re: load incremental data

Sorce table avalable in oracle and need to load it into sas dataset.

 if i want to add updated records as well as new records then what should i do?

 

Super User
Posts: 6,936

Re: load incremental data

Slowly the picture is getting clearer Smiley Wink

 

First, I'd try to identify the records to update on the Oracle side and perform an unload of specific records that need updating into a SAS dataset. This reduces the amount of data that needs to be transferred out of the Oracle database.

Then run a proc sql with update.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,936

Re: load incremental data

After performing a test here with proc sql and update, I strongly recommend using a data step update. Needs more space, but is faster by orders of magnitude.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,256

Re: load incremental data

So, your large master table is in SAS!?

First, make sure that you have indexed the PK for that table. If not already there, consider moving to a SPDE library - since it has much more efficient index maintenance.

Then which method is bets, depends on the data, and your preferences. 

Already mentioned is using a data step (using modify by).

Another option is to query the master table first (something like key not in(select * from master) - but perhaps it may hard for SAS to optimize that query.

You may need to try some different techniques before settling for a solution.

Data never sleeps
Super User
Super User
Posts: 7,401

Re: load incremental data

Hi,

 

Could you provide more information, i.e. what OS, what software you have available etc. Do you have existing data which needs to be updated - if so look at the datastep update syntax:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202975.htm

 

If its new data to be added, then use proc append:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000070936.htm

 

Show what it is your doing, if your data is big, then any operation is going to take time - the append doesn't create a new file, just adds rows to the end, so processing and resource wise this is probably the quickest, however it doesn't update any data already existing so depends on your needs.

Ask a Question
Discussion stats
  • 8 replies
  • 452 views
  • 0 likes
  • 4 in conversation