I have a SAS SQL code which really takes a long time to run. Are there anybody can help me to make it efficient?
proc sql;
delete from recover as a
where exists (select * from nbcase as b where a.customer_id=b.customer_id and (a.nb_created_date < b.nb_first_date or a.nb_created_date > b.nb_last_date));
quit;
Thanks.
Can you provide some background information about what the fields in the table mean, and what the purpose of each table is? It's difficult to make any suggestions without understanding the problem.
I want to delete some records from Table recover where customer id matched the same id in table nbcase and nb_created _date < nb_first_date or nb_created_date > nb_last_date.
Both tables have duplicate customer ids and table nbcase unique on customer id and nb_first_date.
Ok, so I'm assuming the table nb_case is some kind of history for a customer, where each change is recorded with First_Date and Last_Date as the effective start/end dates for the other information on that row.
What is the relationship between Created Date and first/last, and what is the meaning of the rows in the Recover table?
It might also help to show some made up sample data for both tables.
This is just to provide sample data for your example as well as a duplicate of your SQL.
data recover(rename=(i=customer_id) sortedby=customer_id);
do i=1 to 10000;
nb_created_date=round(ranuni(1234)*100,11)+'01AUG2011'd;
output;
end;
run;
data nbcase(rename=(i=customer_id) sortedby=customer_id drop=j k);
do i=1 to 10000;
j=round(ranuni(1234)*10,1);
do k=1 to j;
nb_first_date=round(ranuni(1234)*100,11)+'01AUG2011'd;
nb_last_date=round(ranuni(2345)*100,11)+'01AUG2011'd;
output;
end;
end;
run;
proc sql;
delete from _recover r
where exists ( select *
from nbcase nb
where r.customer_id=nb.customer_id
and ( r.nb_created_date < nb.nb_first_date
or r.nb_created_date > nb.nb_last_date )
);
quit;
Message was edited by: Matthew Kastin Reason: Previous posting was erroneous after testing.
The resource hog is the correlated subquery. Try to do it with an non-correlated subquery instead.
For example, if FriedEgg's data sets are a good simulation, try this SQL query instead:
proc sql stimer; create table del as select distinct r.customer_ID from nbcase as nb , Recover as r where r.customer_id=nb.customer_id and ( r.nb_created_date < nb.nb_first_date or r.nb_created_date > nb.nb_last_date ) ; delete from recover r where Customer_ID in (select Customer_ID from del) ; quit;
Even though I created a temp table, the 2 queries above consumed a total of 0.8 CPU seconds vice 16.5 CPU seconds. Elapsed time was 0.3 seconds vice 16 seconds for the original query (I have a quad core processor, so CPU time sometimes exceeds Elapsed Time when multi-threading).
With the constant re-execution of the correlated subquery (inner query ) in the original code, the process is probably I/O bound. The non-correlated sub query in the code above (inner query) executes once and passes the resulting list of values to the outer query. This is much less I/O intensive. YMMV but it's worth a shot.
Message was edited by: Mark Jordan due to spelling errors in the original entry.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.