DATA Step, Macro, Functions and more

Optimize

Reply
Super Contributor
Posts: 673

Optimize

The dataset has 108412982 records. And the following query is taking very long time to execute(more than 3 hrs). Is there a better way to reduce the execution time?I'm running this on Unix.

create table email

as

select

        a.GALC_ACCT_NO as ACCT_NO

        ,a.GALC_ALRT_EMAIL_ADDR_TX1 as email

from outdata.DMGADR_171212 a

join (

                select

                                b.GALC_ACCT_NO

                                ,max(GALC_LST_UPDT_TS) as last_updt_dt

                from outdata.DMGADR_171212 b

                where coalescec(b.GALC_ADDR_TYPE_CD,'')='EML'

                                        and b.GALC_ALRT_EMAIL_ADDR_TX1 is not null

                group by b.GALC_ACCT_NO

) b on a.GALC_ACCT_NO=b.GALC_ACCT_NO and a.GALC_LST_UPDT_TS=b.last_updt_dt

where a.GALC_ADDR_TYPE_CD='EML'

                        and a.GALC_ALRT_EMAIL_ADDR_TX1 is not null

;

Respected Advisor
Posts: 4,920

Re: Optimize

Seems like the call to coalescec is superflous and might prevent some optimizations. What indexes exist on outdata.DMGADR_171212?

PG

PG
Super User
Posts: 5,427

Re: Optimize

When joining a table of this size onto a summarized self, I doubt that you can make use of any index, unless your subset narrows down the result set quite a bit).

You could try to first do the subset, and then join result set onto itself.

It seems you search the last record for an account? At least the transactions for the last day for each account?

Instead of a join with a subquery, you could try to rewrite it, using HAVING logic.

Data never sleeps
Respected Advisor
Posts: 4,920

Re: Optimize

Good idea LinusH. That would be :

select

          GALC_ACCT_NO as ACCT_NO

        , GALC_ALRT_EMAIL_ADDR_TX1 as email

from outdata.DMGADR_171212

where GALC_ADDR_TYPE_CD='EML' and GALC_ALRT_EMAIL_ADDR_TX1 is not missing

group by GALC_ACCT_NO

having GALC_LST_UPDT_TS = max(GALC_LST_UPDT_TS);

PG

PG
Super Contributor
Posts: 673

Re: Optimize

this gives this message:The query requires remerging summary statistics back with the original data

PROC Star
Posts: 7,471

Re: Optimize

I don't think that is a problem in your case.  Take a look at: http://support.sas.com/kb/4/308.html

Respected Advisor
Posts: 4,920

Re: Optimize

Yes, that's what is intended. Is it any faster?

PG
Super Contributor
Posts: 673

Re: Optimize

It is.but i'm getting multiple records for acct_no and want to see why i'm getting mutiple records based on the date.

Respected Advisor
Posts: 4,920

Re: Optimize

You could try adding GALC_LST_UPDT_TS to the selected columns. If there are many GALC_LST_UPDT_TS with the same value for a given acct_no and = to the max, then they would all satisfy the HAVING condition and be selected.

PG
Ask a Question
Discussion stats
  • 8 replies
  • 300 views
  • 1 like
  • 4 in conversation