The dataset has 108412982 records. And the following query is taking very long time to execute(more than 3 hrs). Is there a better way to reduce the execution time?I'm running this on Unix.
create table email
as
select
a.GALC_ACCT_NO as ACCT_NO
,a.GALC_ALRT_EMAIL_ADDR_TX1 as email
from outdata.DMGADR_171212 a
join (
select
b.GALC_ACCT_NO
,max(GALC_LST_UPDT_TS) as last_updt_dt
from outdata.DMGADR_171212 b
where coalescec(b.GALC_ADDR_TYPE_CD,'')='EML'
and b.GALC_ALRT_EMAIL_ADDR_TX1 is not null
group by b.GALC_ACCT_NO
) b on a.GALC_ACCT_NO=b.GALC_ACCT_NO and a.GALC_LST_UPDT_TS=b.last_updt_dt
where a.GALC_ADDR_TYPE_CD='EML'
and a.GALC_ALRT_EMAIL_ADDR_TX1 is not null
;
Seems like the call to coalescec is superflous and might prevent some optimizations. What indexes exist on outdata.DMGADR_171212?
PG
When joining a table of this size onto a summarized self, I doubt that you can make use of any index, unless your subset narrows down the result set quite a bit).
You could try to first do the subset, and then join result set onto itself.
It seems you search the last record for an account? At least the transactions for the last day for each account?
Instead of a join with a subquery, you could try to rewrite it, using HAVING logic.
Good idea LinusH. That would be :
select
GALC_ACCT_NO as ACCT_NO
, GALC_ALRT_EMAIL_ADDR_TX1 as email
from outdata.DMGADR_171212
where GALC_ADDR_TYPE_CD='EML' and GALC_ALRT_EMAIL_ADDR_TX1 is not missing
group by GALC_ACCT_NO
having GALC_LST_UPDT_TS = max(GALC_LST_UPDT_TS);
PG
this gives this message:The query requires remerging summary statistics back with the original data
I don't think that is a problem in your case. Take a look at: http://support.sas.com/kb/4/308.html
Yes, that's what is intended. Is it any faster?
It is.but i'm getting multiple records for acct_no and want to see why i'm getting mutiple records based on the date.
You could try adding GALC_LST_UPDT_TS to the selected columns. If there are many GALC_LST_UPDT_TS with the same value for a given acct_no and = to the max, then they would all satisfy the HAVING condition and be selected.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.