BookmarkSubscribeRSS Feed
nirupama1
Fluorite | Level 6

proc sql;

create table call as

select 

  input(cdf.case_id,best12.) as mbr_pgm_id ,

  count(distinct(af.actv_id)) as AttemptedCallsN , 

  cdf.TOUCH_DT_DIMID

from  hmaodm.cse_dtl_fact cdf

       left join  hmaodm.actv_fact af

       on cdf.case_id = af.case_id

       and  af.cse_src_sys_cd in ('CPM')

left join  HMAODM.ref_dim RDA

     on  af.actv_rsltn_otcome_dimid = rda.ref_dimid

     and  rda.ref_nm = 'activityResolutionOutcomeType'

left join HMAODM.CUST_SEG_DIM CSD

    on CDF.CUST_SEG_DIMID = CSD.CUST_SEG_DIMID 

 

 where cdf.qual_dt_dimid > 0

     and cdf.touch_dt_dimid  = .

     and rda.ref_cd in (&actv_typ_id.)                                                                                                                                        

     and RDA.ref_cd <> '44'                                                                                                                                                                 AND CSD.CUST_SEG_NBR IN (&user_defined_policy.)           

     AND (CDF.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.            

     AND (CDF.CSE_CLOS_DT_DIMID = 99991231 or CDF.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.))

     AND CDF.MEDCR_IND_DIMID <>1

group by cdf.case_id,  cdf.TOUCH_DT_DIMID;

quit;

 

This multiple join is taking 50+ hours to run. Please help me to optimize it, so that it takes less execution time.

I tried to split the above query as below but the output does not match. Please help! It is urgent .

 

proc sql noprint;

create table call1 as

select

  cdf.case_id  ,

  af.actv_rsltn_otcome_dimid,

  count(distinct(actv_id)) as AttemptedCallsN

from  CRP_HMAODM.cse_dtl_fact cdf

      left join  CRP_HMAODM.actv_fact af

       on cdf.case_id = af.case_id

       and  af.cse_src_sys_cd in ('CPM')

where cdf.qual_dt_dimid > 0

     and cdf.touch_dt_dimid  IS NULL          

     AND (CDF.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.            

     AND (CDF.CSE_CLOS_DT_DIMID = 99991231 or CDF.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.))

     AND CDF.MEDCR_IND_DIMID <>1

 

group by cdf.case_id,af.actv_rsltn_otcome_dimid

order by CDF.case_id

;

quit;

 

proc sql ;

create table call2 as

select distinct CALL1.*

from call1

left join  HMAODM.ref_dim RDA

       on  call1.actv_rsltn_otcome_dimid = rda.ref_dimid

      and  rda.ref_nm = 'activityResolutionOutcomeType'

where rda.ref_cd in (&actv_typ_id.)                                                                                                                                        

     AND RDA.ref_cd <> '44'          

order by call1.case_id

;

quit;

 

proc sql noprint;

create table call3 as

select distinct

CDF.case_id

from CRP_HMAODM.cse_dtl_fact cdf

left join CRP_HMAODM.CUST_SEG_DIM CSD  

      on CDF.CUST_SEG_DIMID = CSD.CUST_SEG_DIMID  

where

CSD.CUST_SEG_NBR IN (&user_defined_policy.)

order by CDF.case_id

;

quit;

 

 

proc sort data=call2;by case_id;RUN;

proc sort data=call3;by case_id;RUN;

 

 

data call23;

merge call2 (IN=A)

      call3 (IN=B)

;

By case_id;

If A THEN OUTPUT;

RUN;

 

19 REPLIES 19
ChrisNZ
Tourmaline | Level 20

1. It looks like the table rda is only used for filtering. Why not have an inner join?

2. It looks like table csd is not used at all.

3. What # of records do you anticipate from each table? Are they indexed?

4. Why not format your code so it can be read without getting a headache? Something like

create table CALL as
select 
    input(cdf.CASE_ID,best12.)  as mbr_pgm_id 
  , count(distinct(af.ACTV_ID)) as AttemptedCallsN 
  , cdf.TOUCH_DT_DIMID
from HMAODM.CSE_DTL_FACT                  cdf
       left join  
     HMAODM.ACTV_FACT                     af
       on  cdf.CASE_ID = af.CASE_ID
       and af.CSE_SRC_SYS_CD in ('CPM')
       left join 
    HMAODM.REF_DIM                        rda
      on  af.ACTV_RSLTN_OTCOME_DIMID = rda.REF_DIMID
      and rda.REF_NM                 = 'activityResolutionOutcomeType'
      left join 
    HMAODM.CUST_SEG_DIM                   csd
      on cdf.CUST_SEG_DIMID =  csd.CUST_SEG_DIMID 
where cdf.QUAL_DT_DIMID     >  0
  and cdf.TOUCH_DT_DIMID    =  .
  and rda.REF_CD            in (&actv_typ_id.)                                                                                                                                        
  and rda.REF_CD            <> '44'                                                                                                                                                                 AND CSD.CUST_SEG_NBR IN (&user_defined_policy.)           
  and cdf.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.            
  and cdf.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.
  and cdf.MEDCR_IND_DIMID   <> 1
group by cdf.CASE_ID, cdf.TOUCH_DT_DIMID;
quit;

 

 

 

ChrisNZ
Tourmaline | Level 20

Oh my goodness, table csd  is  used. Except the AND clause is in column 200!

Do yourself a favour and format your code properly!

So it looks like table csd should be an inner join too?

So maybe try  

cdf  

  inner join

(most discriminating match, rda or csd?)

  inner join

(rda or csd)

  left join

af

 

 

 

 

Kurt_Bremser
Super User

Please supply some additional information.

Are these libraries local SAS libraries, or do you connect to a remote database?

How many observations do the datasets have, with regard to the subsets you do?

What are the attributes of the variables you pull (types, lengths)?

 

You also have stupid storage of dates. Fix that before you even continue.

 

This

input(cdf.case_id,best12.) as mbr_pgm_id

is also bad, keep id's in character.

nirupama1
Fluorite | Level 6
These are local sas libraries. I know it is not appropriately written. The conditions are fine, it generates the desired output.
The actv_fac table is very huge (records in millions).
There is only need to minimize time.
I think the issue is with joins. I tried multiple ways but the result is not the same as the original query.
nirupama1
Fluorite | Level 6
Cse_dtl_fact and actv_fac both are very huge tables
ChrisNZ
Tourmaline | Level 20

Have you tried inner joins?

What # of records do you anticipate from each table before and after the joins?

Are the tables sorted? indexed?

 

Kurt_Bremser
Super User

@nirupama1 wrote:
Cse_dtl_fact and actv_fac both are very huge tables

What you call huge might be a little flyspeck for someone else. Please give us the numbers. Not "millions", but 5 millions, 10 millions, and so on. And the sizes of the required variables.

e. g. how many observations of ACTV_FACT satisfy the condition

CSE_SRC_SYS_CD in ('CPM')

and what are the attributes of CASE_ID and ACTV_ID?

This will determine if and where we can use in-memory techniques.

nirupama1
Fluorite | Level 6
6 million observations fall under CSE_SRC_SYS_CD in ('CPM') along with where condition.
If I run the original query by simply removing join with the ref_dim table and where condition of ref_dim table, the output is 55k records.
nirupama1
Fluorite | Level 6
And this run within seconds but as soon as I join with ref_dim table, it executes in about 2days or more and output results in between13k-18k.
nirupama1
Fluorite | Level 6
I am not sure what exactly u mean by length....
These ids are of 9-10 characters length.
case_id=9charactets
actv_id =10 characters
actv_rsltn_otcome_dimid=maybe negative number or 5-6 digits

nirupama1
Fluorite | Level 6
Length
case_id=30
actv_id =30
actv_rsltn_otcome_dimid=8
Kurt_Bremser
Super User

So this table would need roughly 6 million * 70 bytes in memory, less than 500 MB.

Let's start with the first join:

data call;
set hmaodm.cse_dtl_fact;
if _n_ = 1
then do;
  length
    actv_id $30
    actv_rsltn_otcome_dimid 8
  ;
  declare hash af (dataset:"hmaodm.actv_fact (where=(cse_src_sys_cd in ('CPM')))");
  af.definekey("case_id");
  af.definedata("actv_id","actv_rsltn_otcome_dimid");
  af.definedone();
  call missing(actv_id,actv_rsltn_otcome_dimid);
end;
rc = af.find();
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 4150 views
  • 1 like
  • 4 in conversation