BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

Is there a way to combine the below three steps into one step? I just want to sum the two fields from two tables with group by and then I've to find the difference between those values.

 

Same calculation for both the variables ULTIMATE_PREMIUMS_1 and ULTIMATE_PREMIUMS_2 as I want to see same values in these two variables. Any help?

 

Instead of creating three WORK datasets I want to accomplish it in one dataset.

 

/*Premium - aggregate ULTIMATE_PREMIUMS*/
proc sql;
create table prem as select * ,sum(ULTIMATE_PREMIUMS) as prem_sum from have1
group by
      UNIT,
      BRANCH,
      SUB_SEGMENT,
      ;
quit;

/*History - aggregate PERIOD_VALUE*/
proc sql;
create table hist as select distinct * ,sum(PERIOD_VALUE) as hist_sum from have2
group by
      UNIT,
      BRANCH,
      SUB_SEGMENT,
      ;
quit;

/*Join Premium and History */

proc sql;
    create table want as
        select prem.*,sum(prem.prem_sum,- hist.hist_sum) as ULTIMATE_PREMIUMS_1 length = 8,
            sum(prem.prem_sum,- hist.hist_sum)  as ULTIMATE_PREMIUMS_2 length = 8,
        from
            prem as prem left join 
            hist as hist
            on
            (
            prem.UNIT = hist.UNIT
            and prem.BRANCH = hist.BRANCH
            and prem.SUB_SEGMENT = hist.SUB_SEGMENT
                )
            order by
                prem.UNIT,
                prem.BRANCH,
                prem.SUB_SEGMENT
    ;
quit;
7 REPLIES 7
Kurt_Bremser
Super User

Try this:

proc sql;
create table want as
  select
    coalesce (a.unit,b.unit) as unit,
    coalesce (a.branch,b.branch) as branch,
    coalesce (a.sub_segment,b.sub_segment) as sub_segment,
    a.prem_sum,
    b.hist_sum,
    sum(a.prem_sum, -b.hist_sum) as ultimate_premiums
  from (
    select
      unit,
      branch,
      sub_segment,
      sum(ultimate__premiums) as prem_sum
    from have1
    group by
      unit,
      branch,
      sub_segment
  ) a
  full join (
    select
      unit,
      branch,
      sub_segment,
      sum(period_value) as hist_sum
    from have2
    group by
      unit,
      branch,
      sub_segment
  ) b
  on
    a.unit = b.unit and
    a.branch = b.branch and
    a.sub_segment = b.sub_segment
;
quit;
David_Billa
Rhodochrosite | Level 12

Why coalesce function has been used in select clause? If I want to select
numeric variable,which function to use and how to tackle it?

 

Also why you have used full join? Shouldn't be left join?

Kurt_Bremser
Super User

@David_Billa wrote:

Why coalesce function has been used in select clause? If I want to select
numeric variable,which function to use and how to tackle it?

 

Also why you have used full join? Shouldn't be left join?


I used the full join to make sure the code catches cases where there are entries in have2 that don't exist in have1, and for such cases the COALESCE is needed.

If the left join is sufficient for you, you can also omit the COALESCE.

 

SQL COALESCE works for character and numeric variables.

David_Billa
Rhodochrosite | Level 12
As a alternative approach, can we handle the calculation with 'calculated'
keyword?
mkeintz
PROC Star

You could put them all in a single PROC SQL step, which in itself wouldn't reduce resources.  But you would likely improve efficiency greatly if use use create view instead of create table for prem and hist.   Just be sure to use the NOPRINT option on the proc sql statement - this allows proc sql to defer actual instantiation of the views until they are called for in the create table want statement:

 

proc sql noprint ;
   create view prem as .... ;
   create view hist as ... ;
   create table want as ...;
quit;

 

True, this doesn't reduce the amount of code, but it does avoid writing to disk (and re-reading from disk) of the prem and hist data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

@mkeintz  Please explain further "this allows proc sql to defer actual instantiation of the views until they are called for". What do you mean by instantiation?

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1020 views
  • 1 like
  • 4 in conversation