Re: Replacing tables with views

MartyS · Posted 03-13-2015 02:57 PM

We have a lot of code which follows a similar pattern where we collect data, do a bunch of joins and sorts, then use the data.

* gather records from some source and put it into a table temp1;

proc sql _method;

CREATE TABLE temp1 AS

SELECT t.*,

a.another_field

FROM temp1 LEFT JOIN another_table a

ON t.acct_id = a.acct_id

;

quit;

proc sort data=temp1 nodupkey; by acct_id; run;

proc sql _method;

CREATE TABLE temp1 AS

SELECT t.*,

y.yet_another_field

FROM temp1 LEFT JOIN yet_another_table y

ON t.acct_id = y.acct_id

;

quit;

proc sort proc sort data=temp1 nodupkey; by acct_id; run;

proc sql _method;

CREATE TABLE temp1 AS

SELECT t.*,

e.even_yet_another_field

FROM temp1 LEFT JOIN even_yet_another_table e

ON t.acct_id = e.acct_id

;

quit;

* and so on and so on until we actually use temp1;

This evolved over the decades because every so often someone decides they need another field in the data from another source, so the developer will just inject another CREATE TABLE AS in the middle of the pile. This results in the same data being sloshed around in WORK over and over again. Ideally, we would refactor and test thousands of lines of code every time a join is added, but that's just not viable. Each new CREATE TABLE AS is relatively easy to test in isolation without resorting to full testing.

We have refactored one program (of hundreds) which ends with this crazy long join at the end. It will be more difficult to test changes in the future.

We have discussed replacing all (or most) of the tables with views, deferring the joins and avoiding sloshing around hundreds of fields and millions of records at each proc sql step. There seem to be two problems:

1. There is no view-friendly substitute for proc sort

2. Creating a view with the same name (as is done above with tables) seriously confuses SAS

The hope was to make incremental changes to the code to use views, not to perform a complete rewrite.

How are others handling this?

Thanks,

Marty

Tom · Posted 03-13-2015 03:45 PM

If you are using views created in SAS using PROC SQL then the views can have an ORDER BY clause to eliminate the PROC SORT.

What is hard to eliminate is the NODUPKEY option.

What are you using that to do?

MartyS · Posted 03-13-2015 03:50 PM

Tom,

Thanks for the response. Primarily we use nodupkey to hedge against a joining table having duplicates.

Tom · Posted 03-13-2015 04:05 PM

Why are you using PROC SQL instead of normal data steps?

If you just want one observation per ACCT_ID then it is probably easier to do with data step.

data want ;

merge table1 (in=in1)

table2 (keep=acct_id field1 field2)

table3 (keep=acct_id field3)

....

tableN (keep=acct_id fieldZ)

;

by acct_id ;

if in1 and first.acct_id ;

run;

Replacing tables with views