BookmarkSubscribeRSS Feed
MartyS
Calcite | Level 5

We have a lot of code which follows a similar pattern where we collect data, do a bunch of joins and sorts, then use the data.

* gather records from some source and put it into a table temp1;

proc sql _method;

     CREATE TABLE temp1 AS

     SELECT     t.*,

                       a.another_field

     FROM     temp1 LEFT JOIN another_table a

                         ON t.acct_id = a.acct_id

     ;

quit;

proc sort data=temp1 nodupkey; by acct_id; run;

proc sql _method;

     CREATE TABLE temp1 AS

     SELECT     t.*,

                       y.yet_another_field

     FROM     temp1 LEFT JOIN yet_another_table y

                         ON t.acct_id = y.acct_id

     ;

quit;

proc sort proc sort data=temp1 nodupkey; by acct_id; run;

proc sql _method;

     CREATE TABLE temp1 AS

     SELECT     t.*,

                       e.even_yet_another_field

     FROM     temp1 LEFT JOIN even_yet_another_table e

                         ON t.acct_id = e.acct_id

     ;

quit;

* and so on and so on until we actually use temp1;

This evolved over the decades because every so often someone decides they need another field in the data from another source, so the developer will just inject another CREATE TABLE AS in the middle of the pile. This results in the same data being sloshed around in WORK over and over again. Ideally, we would refactor and test thousands of lines of code every time a join is added, but that's just not viable. Each new CREATE TABLE AS is relatively easy to test in isolation without resorting to full testing.

We have refactored one program (of hundreds) which ends with this crazy long join at the end. It will be more difficult to test changes in the future.

We have discussed replacing all (or most) of the tables with views, deferring the joins and avoiding sloshing around hundreds of fields and millions of records at each proc sql step. There seem to be two problems:

1. There is no view-friendly substitute for proc sort

2. Creating a view with the same name (as is done above with tables) seriously confuses SAS

The hope was to make incremental changes to the code to use views, not to perform a complete rewrite.

How are others handling this?

Thanks,

Marty

3 REPLIES 3
Tom
Super User Tom
Super User

If you are using views created in SAS using PROC SQL then the views can have an ORDER BY clause to eliminate the PROC SORT.

What is hard to eliminate is the NODUPKEY option.

What are you using that to do?

MartyS
Calcite | Level 5

Tom,

Thanks for the response. Primarily we use nodupkey to hedge against a joining table having duplicates.

Tom
Super User Tom
Super User

Why are you using PROC SQL instead of normal data steps?

If you just want one observation per ACCT_ID then it is probably easier to do with data step.

data want ;

  merge table1 (in=in1)

            table2 (keep=acct_id field1 field2)

            table3 (keep=acct_id field3)

....

            tableN (keep=acct_id fieldZ)

;

by acct_id ;

if in1 and first.acct_id ;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 797 views
  • 6 likes
  • 2 in conversation