I was wondering if anyone has tested the following scenarios and can share what sort of performance gain/loss you get compared to pass through sql
1. Using proc append where base is a sas dataset and data is from an Oracle source (and using a where clause)
2. Datastep with a where clause grabbing only changed records from an Oracle source.
3. Datastep view with a where clause to an oracle source as opposed to physically writing the data.
I currently have an ETL process that uses datastep to update / append a local copy of data on our server and want to see if I can make it more efficient. I am particularly interested in the use of proc append to get my daily extracts.
None of the scenarios you are proposing are using SQL Pass Through.
2 and 3 are almost identical. The view must be resolved in order to retrieve the data. So, yes the creation of the view is done immediately, but resolution will take probably the same time as 2 (depends how you are defining the view).
Append works fast, because it does not need to scan the whole base table, only data is effectively processed row by row. And again it depends how data is retrieved.
I would indeed consider adding a 4th scenario, which would be, a SQL Pass Through extract to a sas dataset, followed by proc append to the base dataset. It depends on the amount of data you're trying to retrieve, on the DBMS engine, on how complex is your WHERE condition.
From my experience, you won't get a clear answer without doing some serious testing of the 4 approachs (although I would put aside rightaway 2 and 3).