DATA Step, Macro, Functions and more

Proc Sql Question

Reply
Super Contributor
Posts: 268

Proc Sql Question

Hi,

I have a sql query written by a previous programmer -

PROC SQL;
CREATE TABLE WORK.BASE_PGM AS SELECT  DISTINCT
(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS
,PRCHSR_CD AS DEMO_PRCHSR_CODE
,CASE WHEN GDR IS NOT MISSING THEN GDR
ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING
,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY
ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING
,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE
,CASE WHEN CALCULATED DEMO_AGE BETWEEN 18 AND 34.99 THEN '18 - 34'
WHEN CALCULATED DEMO_AGE BETWEEN 35 AND 44.99 THEN '35 - 44'
WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'
WHEN CALCULATED DEMO_AGE >= 55 AND 44.99 THEN 'OVER 55'
ELSE 'UNKNOWN' END AS DEMO_AGE_GROUPING 
,PGM_NM AS PGM_NM
,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED
,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED
FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE
WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0
GROUP BY PRCHSR_CD,DOB_DT,GDR;
QUIT;

Does anyone know if there could be performance issues with calculating the age and using it again in a case statement? The report is taking excess time and my hunch is this query is taking more time than necessary.

Any thoughts/feedback will be helpful.

Thanks,
saspert.

PROC Star
Posts: 7,360

Proc Sql Question

I wouldn't think so, but your only other option would be to create and apply a format to the calculated demo_age.  You could always extract a few records and compare the two methods.

I would think that your excessive time is due to something else.

Super User
Posts: 17,784

Proc Sql Question

There are so many factors that influence how a SQL proc or datastep runs that I find the quickest way to test it is try and see which affects the running time by commenting out parts and comparing the differences.

This isn't 100% by far, but it works pretty well.

Another possibility is the two distinct options in the code, they have to require a sort of some kind on the back end...

PROC Star
Posts: 7,360

Proc Sql Question

FWIW, I ran a VERY unscientific test and discovered that the way it was written was faster.  However, in doing that, I noticed a couple of probable errors in your inherited code that you might want to fix.

WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'

WHEN CALCULATED DEMO_AGE >= 55 AND 44.99 THEN 'OVER 55'

In the first of those lines, I would think that should be '45 - 54'

In the second of those line, the AND 44.99 doesn't make any sense and the resulting value should be 'OVER 54'

Anyhow, the code I ran was as follows:

data F_MEMBER_PROGRAM_BASE_PGM;

  informat dob_dt anydtdtm21.;

  input (PRCHSR_CD GDR) ($) DOB_DT;

  member_key_char=1;

  PRCHSR_CD=1;

  ETHNICITY="1";

  PGM_NM=1;

  PROGRAM_ENROLLED=1;

  OPPORTUNITY_COMPLETED=1;

  ACTION_ITEM_RESPONSE_COMPLETED=1;

  do i=1 to 10000;output;end;

  cards;

  1 1 2/20/1970:12:10:00

  1 1 2/20/1970:12:10:00

  2 2 2/20/1960:12:10:00

  2 2 2/20/1960:12:10:00

  3 3 2/20/1949:12:10:00

  3 3 2/20/1949:12:10:00

;

PROC SQL;

CREATE TABLE WORK.BASE_PGM AS SELECT  DISTINCT

(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS

,PRCHSR_CD AS DEMO_PRCHSR_CODE

,CASE WHEN GDR IS NOT MISSING THEN GDR

ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING

,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY

ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING

,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE

,CASE WHEN CALCULATED DEMO_AGE BETWEEN 18 AND 34.99 THEN '18 - 34'

WHEN CALCULATED DEMO_AGE BETWEEN 35 AND 44.99 THEN '35 - 44'

WHEN CALCULATED DEMO_AGE BETWEEN 45 AND 54.99 THEN '45 - 55'

WHEN CALCULATED DEMO_AGE >= 55 THEN 'OVER 55'

ELSE 'UNKNOWN' END AS DEMO_AGE_GROUPING 

,PGM_NM AS PGM_NM

,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED

,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED

FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE

WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0

GROUP BY PRCHSR_CD,DOB_DT,GDR;

QUIT;

proc format;

  value agegrp

  18-34.99="18 - 34"

  35-44.99="35 - 44"

  45-54.99="18 - 55"

  other="OVER 55";

run;

PROC SQL;

CREATE TABLE WORK.BASE_PGM AS SELECT  DISTINCT

(COUNT (DISTINCT MEMBER_KEY_CHAR )) AS DEMO_PARTICIPATING_USERS

,PRCHSR_CD AS DEMO_PRCHSR_CODE

,CASE WHEN GDR IS NOT MISSING THEN GDR

ELSE 'UNKNOWN' END AS DEMO_GDR_GROUPING

,CASE WHEN ETHNICITY IS NOT MISSING THEN ETHNICITY

ELSE 'UNKNOWN' END AS DEMO_ETHN_GROUPING

,((TODAY()-DATEPART(DOB_DT))/365.25) AS DEMO_AGE

,put(((TODAY()-DATEPART(DOB_DT))/365.25),agegrp.) AS DEMO_AGE_GROUPING

,PGM_NM AS PGM_NM

,OPPORTUNITY_COMPLETED AS DETAIL_OPPORTUNITY_COMPLETED

,ACTION_ITEM_RESPONSE_COMPLETED AS DETAIL_ACTION_ITEM_COMPLETED

FROM F_MEMBER_PROGRAM_BASE_PGM AS F_MEMBER_PROGRAM_BASE

WHERE F_MEMBER_PROGRAM_BASE.PROGRAM_ENROLLED > 0

GROUP BY PRCHSR_CD,DOB_DT,GDR;

QUIT;

Super Contributor
Posts: 268

Proc Sql Question

Thank you Reeza and Art for the feedback.

Ask a Question
Discussion stats
  • 4 replies
  • 155 views
  • 0 likes
  • 3 in conversation