How does Proc Append work?

Accepted Solution Solved
Reply
Respected Advisor
Posts: 4,173
Accepted Solution

How does Proc Append work?

I realised lately that I don’t understand how PROC APPEND appends data to a SAS data set (not talking about a DBMS but SAS files on a file system).

We learn that we always should append the small data set to the large data set, so:

PROC APPEND BASE=<large SAS file> data=<small SAS file>;

quit;

I do understand that the “data” data set needs to be copied so it makes sense to me that this one should be the small table.

What I don’t understand is how the small table gets appended to the large table.

For example: If I append every week 50T rows to a base table will performance be stable or deteriorate over time?

I currently believe that how an append works also depends on OS and file system – but as said: I don’t know so please let me know what you know – or where I could find some documentation/a white paper which explains in-depth what’s going on.

Thanks

Patrick


Accepted Solutions
Solution
‎04-18-2012 12:47 PM
SAS Employee
Posts: 1

Re: How does Proc Append work?

Hi, Patrick.

The DATA data set's observations are read - nothing else is done to the DATA data set. As those observations are read, the BASE data set is updated, adding an observation at a time,  or observations, depending on whether the fast-append method is being used or not. When all the observations have been added, the descriptor of the BASE data set (if this is using the BASE engine - other engines may act differently) is updated appropriately. The BASE data set is not rewritten.

As far as your stability question, the BASE data set will remain stable.

Thanks!

Diane

View solution in original post


All Replies
Regular Contributor
Posts: 184

Re: How does Proc Append work?

There are a number of reasons to use the smaller data set as the BASE. See

http://www.sascommunity.org/wiki/Concatenation_Operations_with_PROC_APPEND#Designating_the_Master

Patrick wrote:

I realised lately that I don’t understand how PROC APPEND appends data to a SAS data set (not talking about a DBMS but SAS files on a file system).

We learn that we always should append the small data set to the large data set, so:

PROC APPEND BASE=<large SAS file> data=<small SAS file>;

quit;

I do understand that the “data” data set needs to be copied so it makes sense to me that this one should be the small table.

What I don’t understand is how the small table gets appended to the large table.

For example: If I append every week 50T rows to a base table will performance be stable or deteriorate over time?

I currently believe that how an append works also depends on OS and file system – but as said: I don’t know so please let me know what you know – or where I could find some documentation/a white paper which explains in-depth what’s going on.

Thanks

Patrick

Respected Advisor
Posts: 4,173

Re: How does Proc Append work?

Hi Howles

Thanks for the link but: That's all stuff I know and understand.

I believe that the answer and explanation for the following question could give me what I'm after: 

"If I append every week 50T rows to a base table (SAS file, no indexes) will performance be stable or deteriorate over time?"

Thanks

Patrick

Trusted Advisor
Posts: 1,301

Re: How does Proc Append work?

Appending data should be fairly linear however there are lots of things that can affect this negatively, especially things like index's or other other table constraints.

Regular Contributor
Posts: 184

Re: How does Proc Append work?

Patrick wrote:

Hi Howles

Thanks for the link but: That's all stuff I know and understand.

I believe that the answer and explanation for the following question could give me what I'm after:

"If I append every week 50T rows to a base table (SAS file, no indexes) will performance be stable or deteriorate over time?"

Thanks

Patrick

Should be stable. The BASE data set is not rewritten

Respected Advisor
Posts: 4,173

Re: How does Proc Append work?

Hmm...

Now totally demonstrating my ignorance with the following question: Does that mean that Proc Append will somehow copy the "data=" table (and eventually remove the descripter part of it), update table attributes like "number of observations" in the descriptor part of the master table and then somehow tell the filesystem (eg. NTFS) that it now has to link the last cluster of the "old" master table to the first cluster of the now copied "data" table, though creating a new bigger master table?

Solution
‎04-18-2012 12:47 PM
SAS Employee
Posts: 1

Re: How does Proc Append work?

Hi, Patrick.

The DATA data set's observations are read - nothing else is done to the DATA data set. As those observations are read, the BASE data set is updated, adding an observation at a time,  or observations, depending on whether the fast-append method is being used or not. When all the observations have been added, the descriptor of the BASE data set (if this is using the BASE engine - other engines may act differently) is updated appropriately. The BASE data set is not rewritten.

As far as your stability question, the BASE data set will remain stable.

Thanks!

Diane

Super User
Super User
Posts: 7,065

Re: How does Proc Append work?

When you use a data step to append two datasets ( data base; set base new ; run; ) then SAS reads all of both tables and writes data from both tables to a new temporary file.  When the step is finished to original file for the base dataset is deleted and the temporary file is renamed.

When you use append to do the same thing. (proc append base=base data=new; runSmiley Wink then SAS only was to read and write the observations from the new dataset.

So proc append allows the elimination of the need to copy the existing observations in the base table. The details of how it actually appends the data to the existing dataset and updates the internal structures to keep it consistent are not that important. 

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 867 views
  • 0 likes
  • 5 in conversation