BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

The dataset has 108412982 records. And the following query is taking very long time to execute(more than 3 hrs). Is there a better way to reduce the execution time?I'm running this on Unix.

create table email

as

select

        a.GALC_ACCT_NO as ACCT_NO

        ,a.GALC_ALRT_EMAIL_ADDR_TX1 as email

from outdata.DMGADR_171212 a

join (

                select

                                b.GALC_ACCT_NO

                                ,max(GALC_LST_UPDT_TS) as last_updt_dt

                from outdata.DMGADR_171212 b

                where coalescec(b.GALC_ADDR_TYPE_CD,'')='EML'

                                        and b.GALC_ALRT_EMAIL_ADDR_TX1 is not null

                group by b.GALC_ACCT_NO

) b on a.GALC_ACCT_NO=b.GALC_ACCT_NO and a.GALC_LST_UPDT_TS=b.last_updt_dt

where a.GALC_ADDR_TYPE_CD='EML'

                        and a.GALC_ALRT_EMAIL_ADDR_TX1 is not null

;

8 REPLIES 8
PGStats
Opal | Level 21

Seems like the call to coalescec is superflous and might prevent some optimizations. What indexes exist on outdata.DMGADR_171212?

PG

PG
LinusH
Tourmaline | Level 20

When joining a table of this size onto a summarized self, I doubt that you can make use of any index, unless your subset narrows down the result set quite a bit).

You could try to first do the subset, and then join result set onto itself.

It seems you search the last record for an account? At least the transactions for the last day for each account?

Instead of a join with a subquery, you could try to rewrite it, using HAVING logic.

Data never sleeps
PGStats
Opal | Level 21

Good idea LinusH. That would be :

select

          GALC_ACCT_NO as ACCT_NO

        , GALC_ALRT_EMAIL_ADDR_TX1 as email

from outdata.DMGADR_171212

where GALC_ADDR_TYPE_CD='EML' and GALC_ALRT_EMAIL_ADDR_TX1 is not missing

group by GALC_ACCT_NO

having GALC_LST_UPDT_TS = max(GALC_LST_UPDT_TS);

PG

PG
SASPhile
Quartz | Level 8

this gives this message:The query requires remerging summary statistics back with the original data

art297
Opal | Level 21

I don't think that is a problem in your case.  Take a look at: http://support.sas.com/kb/4/308.html

PGStats
Opal | Level 21

Yes, that's what is intended. Is it any faster?

PG
SASPhile
Quartz | Level 8

It is.but i'm getting multiple records for acct_no and want to see why i'm getting mutiple records based on the date.

PGStats
Opal | Level 21

You could try adding GALC_LST_UPDT_TS to the selected columns. If there are many GALC_LST_UPDT_TS with the same value for a given acct_no and = to the max, then they would all satisfy the HAVING condition and be selected.

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1601 views
  • 1 like
  • 4 in conversation