BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RandoDando
Pyrite | Level 9

Hello

I need some help writing a SQL query which will group by a variable while taking the max of something where something is true.  Sounds much simpler than it really is, so I'll jump right to my query...

 

proc sql;
create table INSPN_FY as
select distinct
a.HUB_Name,
b.PROP_ID,
b.PROP_NAME,
c.INSPN_FISC_YR,
e.FISC_YR as ASMT_FISC_YR,
c.INSPECTION_ID,
c.INSPECTION_CD,
c.INSPECTION_SCORE
from HUB_TBL a INNER JOIN PROPERTY_TBL b ON a.HUB_ID  = b.HUB_ID 
INNER JOIN 
INSPECTION_TBL c ON c.PROP_ID = b.PROP_ID
LEFT JOIN INSPECTION_ASSESSMENT d ON c.INSPECTION_ID = d.INSPECTION_ID
LEFT JOIN ASSESSMENT_TBL e on d.ASMT_ID  = e.ASMT_ID and d.GRP_ID = e.GRP_ID and d.VER_ID = e.VER_ID
where  e.FISC_YR <> . and 
c.INSPECTION_ID >=500000 and c.INSPECTION_ID <1000000 and
c.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
and c.PROP_ID is not null and
(c.INSPECTION_SCORE is not null OR c.INSPECTION_CD in( 'VUI'))  
and b.PROPERTY_UNIT_CNT > 0 
ORDER BY a.HUB_Name,
b.PROP_ID,
b.PROP_NAME,
c.INSPN_FISC_YR,
e.FISC_YR ;
quit;

I want to select inspections for properties by assessment FY, where the conditions in the WHERE clause are met.  However, some cases have 2 inspections in the same assessment year, one with a VUI status and one released, such as:

PROPERTY_ID

PROPERTY_UNIT_CNT

INSPN_FISC_YR

ASMT_FISC_YR

INSPECTION_ID

INSPECTION_CODE

XX999999

28

2016

2017

550505

RTN

XX999999

28

2017

2017

690908

VUI

 

And some have two released, such as:

PROPERTY_ID

PROPERTY_UNIT_CNT

INSPN_FISC_YR

ASMT_FISC_YR

INSPECTION_ID

INSPECTION_CD

XX999999

74

2016

2017

570707

RTN

XX999999

74

2017

2017

680808

RTN

 

In the first case, I want the RTN, while in the second case I want the MAX inspection ID.  However, if the only inspection for a FY is VUI, I still want it (only omit them if there is a duplicative released).

 

How can I do this in 1 SQL step?  I need it in 1 step because I plan to have this table automated in an ETL.  SQL only please.

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

The first thing I do when I see code like yours is to make it readable by

  1. Getting rid of the aliases. With a query as large as that, referring to tables as "a", "b", "c" etc. only obfuscates the whole thing. This is easily accomplished by a search an replace in any text editor.
  2. Doing a little bit of indentation.

After that, your code looks like this:

 

proc sql;
create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID LEFT JOIN 
     INSPECTION_ASSESSMENT ON INSPECTION_TBL.INSPECTION_ID = INSPECTION_ASSESSMENT.INSPECTION_ID LEFT JOIN 
     ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
      and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
      and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
where ASSESSMENT_TBL.FISC_YR <> . 
  and INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

 

 

The first thing that becomes obvious is that the LEFT joins are silly, they suggest that you may want to have stuff that is not in the ASSESMENT_TBL, but you explicitly make a WHERE clause specifying that assessment fiscal year cannot be missing. So really, it is all inner join.

 

As you are not using any data from the link table, my next suggestion would be to put the assessment stuff into a subquery:

 

proc sql;
create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENTS.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID JOIN 
     (select 
        INSPECTION_ASSESSMENT.INSPECTION_ID,
        ASSESSMENT_TBL.FISC_YR
      from INSPECTION_ASSESSMENT JOIN 
           ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
             and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
             and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
      where ASSESSMENT_TBL.FISC_YR <> . ) ASSESSMENTS on ASSESSMENTS.INSPECTION_ID=INSPECTION_TBL.INSPECTION_ID 
where INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

 

 

Now, what you want is the maximum PROPERTY_ID for each ASMT_FISC_YR, except that INSPECTION_CODE='RTN' takes precedence. If you want to calculate that in a single query, the easiest way is probably to create a dummy variable, which is the property id plus a large number for 'RTN', and just the id for the others:

case INSPECTION_TBL.INSPECTION_CD
  when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
  else INSPECTION_TBL.INSPECTION_ID
end

You can then use that as your criterion for selecting, e.g.:

create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENTS.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID JOIN 
     (select 
        INSPECTION_ASSESSMENT.INSPECTION_ID,
        ASSESSMENT_TBL.FISC_YR
      from INSPECTION_ASSESSMENT JOIN 
           ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
             and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
             and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
      where ASSESSMENT_TBL.FISC_YR <> . ) ASSESSMENTS on ASSESSMENTS.INSPECTION_ID=INSPECTION_TBL.INSPECTION_ID 
where INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
group by 
  INSPECTION_TBL.INSPECTION_ID,
  ASSESSMENTS.FISC_YR
having case INSPECTION_TBL.INSPECTION_CD
  when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
  else INSPECTION_TBL.INSPECTION_ID
  end 
  =max(case INSPECTION_TBL.INSPECTION_CD
        when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
        else INSPECTION_TBL.INSPECTION_ID
       end)
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

It may be better, both logically and performance-wise, to calculate the criterion (and everything else) in a subquery, and then apply the GROUP BY, HAVING and ORDER BY to the results of that query. But I leave that as an exercise for the reader. And you may have to add some of the other columns to your GROUP BY.

View solution in original post

3 REPLIES 3
RichardDeVen
Barite | Level 11

A 5 table join is a lot to look at with out any data.

The first angle of attach is probably to change the join to table

INNER JOIN 
INSPECTION_TBL c 

to a join to subselect

INNER JOIN 
 ( SELECT ... stuff ...
   from INSPECTION_TBL 
   where ... correlation criteria ...
   group by ... 
   having ... criteria for selecting desired row ...
 ) as c

Complex ordering and selection rules may require the subselect have additional subselects itself.

s_lassen
Meteorite | Level 14

The first thing I do when I see code like yours is to make it readable by

  1. Getting rid of the aliases. With a query as large as that, referring to tables as "a", "b", "c" etc. only obfuscates the whole thing. This is easily accomplished by a search an replace in any text editor.
  2. Doing a little bit of indentation.

After that, your code looks like this:

 

proc sql;
create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID LEFT JOIN 
     INSPECTION_ASSESSMENT ON INSPECTION_TBL.INSPECTION_ID = INSPECTION_ASSESSMENT.INSPECTION_ID LEFT JOIN 
     ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
      and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
      and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
where ASSESSMENT_TBL.FISC_YR <> . 
  and INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

 

 

The first thing that becomes obvious is that the LEFT joins are silly, they suggest that you may want to have stuff that is not in the ASSESMENT_TBL, but you explicitly make a WHERE clause specifying that assessment fiscal year cannot be missing. So really, it is all inner join.

 

As you are not using any data from the link table, my next suggestion would be to put the assessment stuff into a subquery:

 

proc sql;
create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENTS.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID JOIN 
     (select 
        INSPECTION_ASSESSMENT.INSPECTION_ID,
        ASSESSMENT_TBL.FISC_YR
      from INSPECTION_ASSESSMENT JOIN 
           ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
             and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
             and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
      where ASSESSMENT_TBL.FISC_YR <> . ) ASSESSMENTS on ASSESSMENTS.INSPECTION_ID=INSPECTION_TBL.INSPECTION_ID 
where INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

 

 

Now, what you want is the maximum PROPERTY_ID for each ASMT_FISC_YR, except that INSPECTION_CODE='RTN' takes precedence. If you want to calculate that in a single query, the easiest way is probably to create a dummy variable, which is the property id plus a large number for 'RTN', and just the id for the others:

case INSPECTION_TBL.INSPECTION_CD
  when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
  else INSPECTION_TBL.INSPECTION_ID
end

You can then use that as your criterion for selecting, e.g.:

create table INSPN_FY as select distinct
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENTS.FISC_YR as ASMT_FISC_YR,
  INSPECTION_TBL.INSPECTION_ID,
  INSPECTION_TBL.INSPECTION_CD,
  INSPECTION_TBL.INSPECTION_SCORE
from HUB_TBL INNER JOIN 
     PROPERTY_TBL ON HUB_TBL.HUB_ID  = PROPERTY_TBL.HUB_ID INNER JOIN 
     INSPECTION_TBL ON INSPECTION_TBL.PROP_ID = PROPERTY_TBL.PROP_ID JOIN 
     (select 
        INSPECTION_ASSESSMENT.INSPECTION_ID,
        ASSESSMENT_TBL.FISC_YR
      from INSPECTION_ASSESSMENT JOIN 
           ASSESSMENT_TBL on INSPECTION_ASSESSMENT.ASMT_ID  = ASSESSMENT_TBL.ASMT_ID 
             and INSPECTION_ASSESSMENT.GRP_ID = ASSESSMENT_TBL.GRP_ID 
             and INSPECTION_ASSESSMENT.VER_ID = ASSESSMENT_TBL.VER_ID
      where ASSESSMENT_TBL.FISC_YR <> . ) ASSESSMENTS on ASSESSMENTS.INSPECTION_ID=INSPECTION_TBL.INSPECTION_ID 
where INSPECTION_TBL.INSPECTION_ID >=500000 and INSPECTION_TBL.INSPECTION_ID <1000000 
  and INSPECTION_TBL.INSPECTION_CD in('RTN','IRA', 'VUR', 'AWP', 'VUI')
  and INSPECTION_TBL.PROP_ID is not null 
  and (INSPECTION_TBL.INSPECTION_SCORE is not null OR INSPECTION_TBL.INSPECTION_CD in( 'VUI'))  
  and PROPERTY_TBL.PROPERTY_UNIT_CNT > 0 
group by 
  INSPECTION_TBL.INSPECTION_ID,
  ASSESSMENTS.FISC_YR
having case INSPECTION_TBL.INSPECTION_CD
  when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
  else INSPECTION_TBL.INSPECTION_ID
  end 
  =max(case INSPECTION_TBL.INSPECTION_CD
        when 'RTN' then INSPECTION_TBL.INSPECTION_ID+1E8
        else INSPECTION_TBL.INSPECTION_ID
       end)
ORDER BY 
  HUB_TBL.HUB_Name,
  PROPERTY_TBL.PROP_ID,
  PROPERTY_TBL.PROP_NAME,
  INSPECTION_TBL.INSPN_FISC_YR,
  ASSESSMENT_TBL.FISC_YR ;
quit;

It may be better, both logically and performance-wise, to calculate the criterion (and everything else) in a subquery, and then apply the GROUP BY, HAVING and ORDER BY to the results of that query. But I leave that as an exercise for the reader. And you may have to add some of the other columns to your GROUP BY.

RandoDando
Pyrite | Level 9

Thank you.  This code has gone through numerous changes based on feedback from various SMEs, particularly with regard to the WHERE statement.  I tested your solution, and with some minor tweaks, I was able to get it to provide the results I needed.  I had never performed joins on subsets, so this was something I would never have thought of on my own.  

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1243 views
  • 0 likes
  • 3 in conversation