DATA Step, Macro, Functions and more

Optimizing a SQL code

Reply
Contributor
Posts: 38

Optimizing a SQL code

I have a SAS SQL code which really takes a long time to run. Are there anybody can help me to make it efficient?

proc sql;

delete from recover as a

where exists (select * from nbcase as b where a.customer_id=b.customer_id and (a.nb_created_date < b.nb_first_date or a.nb_created_date > b.nb_last_date));

quit;

Thanks.

Frequent Contributor
Frequent Contributor
Posts: 94

Optimizing a SQL code

Can you provide some background information about what the fields in the table mean, and what the purpose of each table is?  It's difficult to make any suggestions without understanding the problem.

Contributor
Posts: 38

Optimizing a SQL code

I want to delete some records from Table recover where customer id matched the same id in table nbcase and nb_created _date < nb_first_date or nb_created_date > nb_last_date.

Both tables have duplicate customer ids and table nbcase unique on customer id and nb_first_date.

Frequent Contributor
Frequent Contributor
Posts: 94

Optimizing a SQL code

Ok, so I'm assuming the table nb_case is some kind of history for a customer, where each change is recorded with First_Date and Last_Date as the effective start/end dates for the other information on that row.

What is the relationship between Created Date and first/last, and what is the meaning of the rows in the Recover table?

It might also help to show some made up sample data for both tables.

Trusted Advisor
Posts: 1,301

Re: Optimizing a SQL code

This is just to provide sample data for your example as well as a duplicate of your SQL.

data recover(rename=(i=customer_id) sortedby=customer_id);

do i=1 to 10000;

  nb_created_date=round(ranuni(1234)*100,11)+'01AUG2011'd;

  output;

end;

run;

data nbcase(rename=(i=customer_id) sortedby=customer_id drop=j k);

do i=1 to 10000;

  j=round(ranuni(1234)*10,1);

  do k=1 to j;

   nb_first_date=round(ranuni(1234)*100,11)+'01AUG2011'd;

   nb_last_date=round(ranuni(2345)*100,11)+'01AUG2011'd;

   output;

  end;

end;

run;

proc sql;

delete from _recover r

where exists ( select *

                  from nbcase nb

                 where r.customer_id=nb.customer_id

                       and ( r.nb_created_date < nb.nb_first_date

                             or r.nb_created_date > nb.nb_last_date )

);

quit;

Message was edited by: Matthew Kastin Reason: Previous posting was erroneous after testing.

SAS Employee
Posts: 104

Re: Optimizing a SQL code

The resource hog is the correlated subquery.  Try to do it with an non-correlated subquery instead. 

For example, if FriedEgg's data sets are a good simulation, try this SQL query instead:

proc sql stimer;
create table del as
select distinct r.customer_ID 
   from nbcase as nb
      , Recover as r
   where r.customer_id=nb.customer_id
     and ( r.nb_created_date < nb.nb_first_date
          or r.nb_created_date > nb.nb_last_date )
;
delete from recover r
where Customer_ID in (select Customer_ID
                  from del)
;
quit;

Even though I created a temp table, the 2 queries above consumed a total of 0.8 CPU seconds vice 16.5 CPU seconds.  Elapsed time was 0.3 seconds vice 16 seconds for the original query (I have a quad core processor, so CPU time sometimes exceeds Elapsed Time when multi-threading). 

With the constant re-execution of the correlated subquery (inner query ) in the original code, the process is probably I/O bound.  The non-correlated sub query in the code above (inner query) executes once and passes the resulting list of values to the outer query. This is much less I/O intensive. YMMV but it's worth a shot.

Message was edited by: Mark Jordan due to spelling errors in the original entry.

Ask a Question
Discussion stats
  • 5 replies
  • 176 views
  • 0 likes
  • 4 in conversation