11-21-2012 04:35 PM
The dataset has 108412982 records. And the following query is taking very long time to execute(more than 3 hrs). Is there a better way to reduce the execution time?I'm running this on Unix.
create table email
a.GALC_ACCT_NO as ACCT_NO
,a.GALC_ALRT_EMAIL_ADDR_TX1 as email
from outdata.DMGADR_171212 a
,max(GALC_LST_UPDT_TS) as last_updt_dt
from outdata.DMGADR_171212 b
and b.GALC_ALRT_EMAIL_ADDR_TX1 is not null
group by b.GALC_ACCT_NO
) b on a.GALC_ACCT_NO=b.GALC_ACCT_NO and a.GALC_LST_UPDT_TS=b.last_updt_dt
and a.GALC_ALRT_EMAIL_ADDR_TX1 is not null
11-21-2012 05:39 PM
When joining a table of this size onto a summarized self, I doubt that you can make use of any index, unless your subset narrows down the result set quite a bit).
You could try to first do the subset, and then join result set onto itself.
It seems you search the last record for an account? At least the transactions for the last day for each account?
Instead of a join with a subquery, you could try to rewrite it, using HAVING logic.
11-21-2012 09:00 PM
Good idea LinusH. That would be :
GALC_ACCT_NO as ACCT_NO
, GALC_ALRT_EMAIL_ADDR_TX1 as email
where GALC_ADDR_TYPE_CD='EML' and GALC_ALRT_EMAIL_ADDR_TX1 is not missing
group by GALC_ACCT_NO
having GALC_LST_UPDT_TS = max(GALC_LST_UPDT_TS);
11-22-2012 11:55 AM
You could try adding GALC_LST_UPDT_TS to the selected columns. If there are many GALC_LST_UPDT_TS with the same value for a given acct_no and = to the max, then they would all satisfy the HAVING condition and be selected.