Hi,
I have a sql query written by a previous programmer -
PROC SQL;
CREATE TABLE WORK.BASE_PGM AS SELECT DISTINCT
(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS
,PRCHSR_CD AS DEMO_PRCHSR_CODE
,CASE WHEN GDR IS NOT MISSING THEN GDR
ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING
,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY
ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING
,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE
,CASE WHEN CALCULATED DEMO_AGE BETWEEN 18 AND 34.99 THEN '18 - 34'
WHEN CALCULATED DEMO_AGE BETWEEN 35 AND 44.99 THEN '35 - 44'
WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'
WHEN CALCULATED DEMO_AGE >= 55 AND 44.99 THEN 'OVER 55'
ELSE 'UNKNOWN' END AS DEMO_AGE_GROUPING
,PGM_NM AS PGM_NM
,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED
,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED
FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE
WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0
GROUP BY PRCHSR_CD,DOB_DT,GDR;
QUIT;
Does anyone know if there could be performance issues with calculating the age and using it again in a case statement? The report is taking excess time and my hunch is this query is taking more time than necessary.
Any thoughts/feedback will be helpful.
Thanks,
saspert.
I wouldn't think so, but your only other option would be to create and apply a format to the calculated demo_age. You could always extract a few records and compare the two methods.
I would think that your excessive time is due to something else.
There are so many factors that influence how a SQL proc or datastep runs that I find the quickest way to test it is try and see which affects the running time by commenting out parts and comparing the differences.
This isn't 100% by far, but it works pretty well.
Another possibility is the two distinct options in the code, they have to require a sort of some kind on the back end...
FWIW, I ran a VERY unscientific test and discovered that the way it was written was faster. However, in doing that, I noticed a couple of probable errors in your inherited code that you might want to fix.
WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'
WHEN CALCULATED DEMO_AGE >= 55 AND 44.99 THEN 'OVER 55'
In the first of those lines, I would think that should be '45 - 54'
In the second of those line, the AND 44.99 doesn't make any sense and the resulting value should be 'OVER 54'
Anyhow, the code I ran was as follows:
data F_MEMBER_PROGRAM_BASE_PGM;
informat dob_dt anydtdtm21.;
input (PRCHSR_CD GDR) ($) DOB_DT;
member_key_char=1;
PRCHSR_CD=1;
ETHNICITY="1";
PGM_NM=1;
PROGRAM_ENROLLED=1;
OPPORTUNITY_COMPLETED=1;
ACTION_ITEM_RESPONSE_COMPLETED=1;
do i=1 to 10000;output;end;
cards;
1 1 2/20/1970:12:10:00
1 1 2/20/1970:12:10:00
2 2 2/20/1960:12:10:00
2 2 2/20/1960:12:10:00
3 3 2/20/1949:12:10:00
3 3 2/20/1949:12:10:00
;
PROC SQL;
CREATE TABLE WORK.BASE_PGM AS SELECT DISTINCT
(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS
,PRCHSR_CD AS DEMO_PRCHSR_CODE
,CASE WHEN GDR IS NOT MISSING THEN GDR
ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING
,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY
ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING
,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE
,CASE WHEN CALCULATED DEMO_AGE BETWEEN 18 AND 34.99 THEN '18 - 34'
WHEN CALCULATED DEMO_AGE BETWEEN 35 AND 44.99 THEN '35 - 44'
WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'
WHEN CALCULATED DEMO_AGE >= 55 THEN 'OVER 55'
ELSE 'UNKNOWN' END AS DEMO_AGE_GROUPING
,PGM_NM AS PGM_NM
,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED
,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED
FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE
WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0
GROUP BY PRCHSR_CD,DOB_DT,GDR;
QUIT;
proc format;
value agegrp
18-34.99="18 - 34"
35-44.99="35 - 44"
45-54.99="18 - 55"
other="OVER 55";
run;
PROC SQL;
CREATE TABLE WORK.BASE_PGM AS SELECT DISTINCT
(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS
,PRCHSR_CD AS DEMO_PRCHSR_CODE
,CASE WHEN GDR IS NOT MISSING THEN GDR
ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING
,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY
ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING
,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE
,put(((TODAY()-DATEPART(DOB_DT))/365.25),agegrp.) AS DEMO_AGE_GROUPING
,PGM_NM AS PGM_NM
,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED
,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED
FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE
WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0
GROUP BY PRCHSR_CD,DOB_DT,GDR;
QUIT;
Thank you Reeza and Art for the feedback.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.