Hi Peter, Thanks for your reply(ies), much appreciated. Let me restate what you said above to make sure I understand what you're saying... 1) Use the SPDE engine over the BASE engine for my scenario. 2) Partition my data by year. For example (pseudocode and untested): libname spde spde "some path"; data spde.mydata_2008 spde.mydata_2009 spde.mydata_2010 spde.mydata_2011; set libref.mydata; select(transaction_year); when(2008) output spde.mydata_2008; when(2009) output spde.mydata_2009; when(2010) output spde.mydata_2010; when(2011) output spde.mydata_2011; end; run; 3) Index the separate SPDE datasets "appropriately" based on the typical queries the end users make. 4) If one of those indexes is unique, i.e. the concatenation of the keys uniquely identifies a record, then I can use this code to update/insert *: proc append base=spde.mydata_2011 data=work.transaction_data_2011 uniqsave=rep; run; Don't use this technique if transaction_data_2011 is "large", i.e. it's a "bulk" update. 5) By splitting the data by year, we can parallel process the updates, i.e. schedule multiple SAS transaction load jobs to run concurrently. 6) What my colleague (ok, my boss) suggested was to create separate data step views for ALL the permutations of the years. For example (pseudocode): v_mydata_all (all years) v_mydata_2008_2009_2010 v_mydata_2008_2009 v_mydata_2009_2010_2011 v_mydata_2009_2010 v_mydata_2010_2011 etc (I may have missed some combination) Then, the end user would have to select the correct view when doing their analysis (both in EG, Futrix, and Stored Process Web Application) to get their desired performance. Which IMO is not ideal. Instead, are you suggesting that a single proc sql view defined as above (outer union corresponding) will get acceptable performance due to the underlying SPDE dataset and related indexes? (That would be great). Did you mean to use different librefs in your proc sql view, rather than different datasets? Would it be "smart enough" to stop reading data when a where clause is filtering by year (eg. transaction_year in (2009,2010). What about transaction_year in (2009, 2011), i.e non-consecutive (probably wouldn't happen, but...) Can you comment on why an SQL view will be more performant than a data step view? Keep in mind that the main reason for this performance tuning/data restructure is to give better perfornance for interactive queries; we can live with current batch job perfornance, although that could change in the future as data volumes increase. I mention this to emphasize that the easier the architecture is for the end users, the better. * As an aside, the proc append functionality reminds me of an Oracle "upsert" (update / insert). It's an update technique which first tries an update, then an insert if the key was not found. Google "Oracle upsert" for more details. Please let me know if I've misunderstood what you've stated above. Thanks, Scott
... View more