Optimizing a SQL code

c8826024 · Posted 08-23-2011 09:16 AM

I have a SAS SQL code which really takes a long time to run. Are there anybody can help me to make it efficient?

proc sql;

delete from recover as a

where exists (select * from nbcase as b where a.customer_id=b.customer_id and (a.nb_created_date < b.nb_first_date or a.nb_created_date > b.nb_last_date));

quit;

Thanks.

DF · Posted 08-23-2011 09:26 AM

Can you provide some background information about what the fields in the table mean, and what the purpose of each table is? It's difficult to make any suggestions without understanding the problem.

c8826024 · Posted 08-23-2011 09:40 AM

I want to delete some records from Table recover where customer id matched the same id in table nbcase and nb_created _date < nb_first_date or nb_created_date > nb_last_date.

Both tables have duplicate customer ids and table nbcase unique on customer id and nb_first_date.

DF · Posted 08-23-2011 12:12 PM

Ok, so I'm assuming the table nb_case is some kind of history for a customer, where each change is recorded with First_Date and Last_Date as the effective start/end dates for the other information on that row.

What is the relationship between Created Date and first/last, and what is the meaning of the rows in the Recover table?

It might also help to show some made up sample data for both tables.

FriedEgg · Posted 08-23-2011 01:03 PM

This is just to provide sample data for your example as well as a duplicate of your SQL.

data recover(rename=(i=customer_id) sortedby=customer_id);

do i=1 to 10000;

nb_created_date=round(ranuni(1234)*100,11)+'01AUG2011'd;

output;

end;

run;

data nbcase(rename=(i=customer_id) sortedby=customer_id drop=j k);

do i=1 to 10000;

j=round(ranuni(1234)*10,1);

do k=1 to j;

nb_first_date=round(ranuni(1234)*100,11)+'01AUG2011'd;

nb_last_date=round(ranuni(2345)*100,11)+'01AUG2011'd;

output;

end;

run;

proc sql;

delete from _recover r

where exists ( select *

from nbcase nb

where r.customer_id=nb.customer_id

and ( r.nb_created_date < nb.nb_first_date

or r.nb_created_date > nb.nb_last_date )

);

quit;

Message was edited by: Matthew Kastin Reason: Previous posting was erroneous after testing.

SASJedi · Posted 08-24-2011 11:11 PM

The resource hog is the correlated subquery. Try to do it with an non-correlated subquery instead.

For example, if FriedEgg's data sets are a good simulation, try this SQL query instead:

proc sql stimer;
create table del as
select distinct r.customer_ID 
   from nbcase as nb
      , Recover as r
   where r.customer_id=nb.customer_id
     and ( r.nb_created_date < nb.nb_first_date
          or r.nb_created_date > nb.nb_last_date )
;
delete from recover r
where Customer_ID in (select Customer_ID
                  from del)
;
quit;

Even though I created a temp table, the 2 queries above consumed a total of 0.8 CPU seconds vice 16.5 CPU seconds. Elapsed time was 0.3 seconds vice 16 seconds for the original query (I have a quad core processor, so CPU time sometimes exceeds Elapsed Time when multi-threading).

With the constant re-execution of the correlated subquery (inner query ) in the original code, the process is probably I/O bound. The non-correlated sub query in the code above (inner query) executes once and passes the resulting list of values to the outer query. This is much less I/O intensive. YMMV but it's worth a shot.

Message was edited by: Mark Jordan due to spelling errors in the original entry.

Check out my Jedi SAS Tricks for SAS Users