The dataset has 108412982 records. And the following query is taking very long time to execute(more than 3 hrs). Is there a better way to reduce the execution time?I'm running this on Unix.
create table email
as
select
a.GALC_ACCT_NO as ACCT_NO
,a.GALC_ALRT_EMAIL_ADDR_TX1 as email
from outdata.DMGADR_171212 a
join (
select
b.GALC_ACCT_NO
,max(GALC_LST_UPDT_TS) as last_updt_dt
from outdata.DMGADR_171212 b
where coalescec(b.GALC_ADDR_TYPE_CD,'')='EML'
and b.GALC_ALRT_EMAIL_ADDR_TX1 is not null
group by b.GALC_ACCT_NO
) b on a.GALC_ACCT_NO=b.GALC_ACCT_NO and a.GALC_LST_UPDT_TS=b.last_updt_dt
where a.GALC_ADDR_TYPE_CD='EML'
and a.GALC_ALRT_EMAIL_ADDR_TX1 is not null
;
Seems like the call to coalescec is superflous and might prevent some optimizations. What indexes exist on outdata.DMGADR_171212?
PG
When joining a table of this size onto a summarized self, I doubt that you can make use of any index, unless your subset narrows down the result set quite a bit).
You could try to first do the subset, and then join result set onto itself.
It seems you search the last record for an account? At least the transactions for the last day for each account?
Instead of a join with a subquery, you could try to rewrite it, using HAVING logic.
Good idea LinusH. That would be :
select
GALC_ACCT_NO as ACCT_NO
, GALC_ALRT_EMAIL_ADDR_TX1 as email
from outdata.DMGADR_171212
where GALC_ADDR_TYPE_CD='EML' and GALC_ALRT_EMAIL_ADDR_TX1 is not missing
group by GALC_ACCT_NO
having GALC_LST_UPDT_TS = max(GALC_LST_UPDT_TS);
PG
this gives this message:The query requires remerging summary statistics back with the original data
I don't think that is a problem in your case. Take a look at: http://support.sas.com/kb/4/308.html
Yes, that's what is intended. Is it any faster?
It is.but i'm getting multiple records for acct_no and want to see why i'm getting mutiple records based on the date.
You could try adding GALC_LST_UPDT_TS to the selected columns. If there are many GALC_LST_UPDT_TS with the same value for a given acct_no and = to the max, then they would all satisfy the HAVING condition and be selected.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.