proc sql;
create table call as
select
input(cdf.case_id,best12.) as mbr_pgm_id ,
count(distinct(af.actv_id)) as AttemptedCallsN ,
cdf.TOUCH_DT_DIMID
from hmaodm.cse_dtl_fact cdf
left join hmaodm.actv_fact af
on cdf.case_id = af.case_id
and af.cse_src_sys_cd in ('CPM')
left join HMAODM.ref_dim RDA
on af.actv_rsltn_otcome_dimid = rda.ref_dimid
and rda.ref_nm = 'activityResolutionOutcomeType'
left join HMAODM.CUST_SEG_DIM CSD
on CDF.CUST_SEG_DIMID = CSD.CUST_SEG_DIMID
where cdf.qual_dt_dimid > 0
and cdf.touch_dt_dimid = .
and rda.ref_cd in (&actv_typ_id.)
and RDA.ref_cd <> '44' AND CSD.CUST_SEG_NBR IN (&user_defined_policy.)
AND (CDF.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.
AND (CDF.CSE_CLOS_DT_DIMID = 99991231 or CDF.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.))
AND CDF.MEDCR_IND_DIMID <>1
group by cdf.case_id, cdf.TOUCH_DT_DIMID;
quit;
This multiple join is taking 50+ hours to run. Please help me to optimize it, so that it takes less execution time.
I tried to split the above query as below but the output does not match. Please help! It is urgent .
proc sql noprint;
create table call1 as
select
cdf.case_id ,
af.actv_rsltn_otcome_dimid,
count(distinct(actv_id)) as AttemptedCallsN
from CRP_HMAODM.cse_dtl_fact cdf
left join CRP_HMAODM.actv_fact af
on cdf.case_id = af.case_id
and af.cse_src_sys_cd in ('CPM')
where cdf.qual_dt_dimid > 0
and cdf.touch_dt_dimid IS NULL
AND (CDF.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.
AND (CDF.CSE_CLOS_DT_DIMID = 99991231 or CDF.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.))
AND CDF.MEDCR_IND_DIMID <>1
group by cdf.case_id,af.actv_rsltn_otcome_dimid
order by CDF.case_id
;
quit;
proc sql ;
create table call2 as
select distinct CALL1.*
from call1
left join HMAODM.ref_dim RDA
on call1.actv_rsltn_otcome_dimid = rda.ref_dimid
and rda.ref_nm = 'activityResolutionOutcomeType'
where rda.ref_cd in (&actv_typ_id.)
AND RDA.ref_cd <> '44'
order by call1.case_id
;
quit;
proc sql noprint;
create table call3 as
select distinct
CDF.case_id
from CRP_HMAODM.cse_dtl_fact cdf
left join CRP_HMAODM.CUST_SEG_DIM CSD
on CDF.CUST_SEG_DIMID = CSD.CUST_SEG_DIMID
where
CSD.CUST_SEG_NBR IN (&user_defined_policy.)
order by CDF.case_id
;
quit;
proc sort data=call2;by case_id;RUN;
proc sort data=call3;by case_id;RUN;
data call23;
merge call2 (IN=A)
call3 (IN=B)
;
By case_id;
If A THEN OUTPUT;
RUN;
1. It looks like the table rda is only used for filtering. Why not have an inner join?
2. It looks like table csd is not used at all.
3. What # of records do you anticipate from each table? Are they indexed?
4. Why not format your code so it can be read without getting a headache? Something like
create table CALL as
select
input(cdf.CASE_ID,best12.) as mbr_pgm_id
, count(distinct(af.ACTV_ID)) as AttemptedCallsN
, cdf.TOUCH_DT_DIMID
from HMAODM.CSE_DTL_FACT cdf
left join
HMAODM.ACTV_FACT af
on cdf.CASE_ID = af.CASE_ID
and af.CSE_SRC_SYS_CD in ('CPM')
left join
HMAODM.REF_DIM rda
on af.ACTV_RSLTN_OTCOME_DIMID = rda.REF_DIMID
and rda.REF_NM = 'activityResolutionOutcomeType'
left join
HMAODM.CUST_SEG_DIM csd
on cdf.CUST_SEG_DIMID = csd.CUST_SEG_DIMID
where cdf.QUAL_DT_DIMID > 0
and cdf.TOUCH_DT_DIMID = .
and rda.REF_CD in (&actv_typ_id.)
and rda.REF_CD <> '44' AND CSD.CUST_SEG_NBR IN (&user_defined_policy.)
and cdf.CSE_OPEN_DT_DIMID <= &end_yyyymmdd.
and cdf.CSE_CLOS_DT_DIMID >= &beg_yyyymmdd.
and cdf.MEDCR_IND_DIMID <> 1
group by cdf.CASE_ID, cdf.TOUCH_DT_DIMID;
quit;
Oh my goodness, table csd is used. Except the AND clause is in column 200!
Do yourself a favour and format your code properly!
So it looks like table csd should be an inner join too?
So maybe try
cdf
inner join
(most discriminating match, rda or csd?)
inner join
(rda or csd)
left join
af
Please supply some additional information.
Are these libraries local SAS libraries, or do you connect to a remote database?
How many observations do the datasets have, with regard to the subsets you do?
What are the attributes of the variables you pull (types, lengths)?
You also have stupid storage of dates. Fix that before you even continue.
This
input(cdf.case_id,best12.) as mbr_pgm_id
is also bad, keep id's in character.
Have you tried inner joins?
What # of records do you anticipate from each table before and after the joins?
Are the tables sorted? indexed?
@nirupama1 wrote:
Cse_dtl_fact and actv_fac both are very huge tables
What you call huge might be a little flyspeck for someone else. Please give us the numbers. Not "millions", but 5 millions, 10 millions, and so on. And the sizes of the required variables.
e. g. how many observations of ACTV_FACT satisfy the condition
CSE_SRC_SYS_CD in ('CPM')
and what are the attributes of CASE_ID and ACTV_ID?
This will determine if and where we can use in-memory techniques.
And what are the lengths of case_id, actv_id and actv_rsltn_otcome_dimid?
Maxim 3: Know Your Data.
Run PROC CONTENTS on your tables, so you get to know the types and other attributes of your columns.
So this table would need roughly 6 million * 70 bytes in memory, less than 500 MB.
Let's start with the first join:
data call;
set hmaodm.cse_dtl_fact;
if _n_ = 1
then do;
length
actv_id $30
actv_rsltn_otcome_dimid 8
;
declare hash af (dataset:"hmaodm.actv_fact (where=(cse_src_sys_cd in ('CPM')))");
af.definekey("case_id");
af.definedata("actv_id","actv_rsltn_otcome_dimid");
af.definedone();
call missing(actv_id,actv_rsltn_otcome_dimid);
end;
rc = af.find();
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.