Re: Aggregate and difference amount in one step

David_Billa · Posted 11-22-2020 11:06 AM

Is there a way to combine the below three steps into one step? I just want to sum the two fields from two tables with group by and then I've to find the difference between those values.

Same calculation for both the variables ULTIMATE_PREMIUMS_1 and ULTIMATE_PREMIUMS_2 as I want to see same values in these two variables. Any help?

Instead of creating three WORK datasets I want to accomplish it in one dataset.

/*Premium - aggregate ULTIMATE_PREMIUMS*/
proc sql;
create table prem as select * ,sum(ULTIMATE_PREMIUMS) as prem_sum from have1
group by
      UNIT,
      BRANCH,
      SUB_SEGMENT,
      ;
quit;

/*History - aggregate PERIOD_VALUE*/
proc sql;
create table hist as select distinct * ,sum(PERIOD_VALUE) as hist_sum from have2
group by
      UNIT,
      BRANCH,
      SUB_SEGMENT,
      ;
quit;

/*Join Premium and History */

proc sql;
    create table want as
        select prem.*,sum(prem.prem_sum,- hist.hist_sum) as ULTIMATE_PREMIUMS_1 length = 8,
            sum(prem.prem_sum,- hist.hist_sum)  as ULTIMATE_PREMIUMS_2 length = 8,
        from
            prem as prem left join 
            hist as hist
            on
            (
            prem.UNIT = hist.UNIT
            and prem.BRANCH = hist.BRANCH
            and prem.SUB_SEGMENT = hist.SUB_SEGMENT
                )
            order by
                prem.UNIT,
                prem.BRANCH,
                prem.SUB_SEGMENT
    ;
quit;

Kurt_Bremser · Posted 11-22-2020 01:53 PM

Try this:

proc sql;
create table want as
  select
    coalesce (a.unit,b.unit) as unit,
    coalesce (a.branch,b.branch) as branch,
    coalesce (a.sub_segment,b.sub_segment) as sub_segment,
    a.prem_sum,
    b.hist_sum,
    sum(a.prem_sum, -b.hist_sum) as ultimate_premiums
  from (
    select
      unit,
      branch,
      sub_segment,
      sum(ultimate__premiums) as prem_sum
    from have1
    group by
      unit,
      branch,
      sub_segment
  ) a
  full join (
    select
      unit,
      branch,
      sub_segment,
      sum(period_value) as hist_sum
    from have2
    group by
      unit,
      branch,
      sub_segment
  ) b
  on
    a.unit = b.unit and
    a.branch = b.branch and
    a.sub_segment = b.sub_segment
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 11-22-2020 08:54 PM

Why coalesce function has been used in select clause? If I want to select
numeric variable,which function to use and how to tackle it?

Also why you have used full join? Shouldn't be left join?

Kurt_Bremser · Posted 11-22-2020 11:43 PM

@David_Billa wrote:

Why coalesce function has been used in select clause? If I want to select
numeric variable,which function to use and how to tackle it?

Also why you have used full join? Shouldn't be left join?

I used the full join to make sure the code catches cases where there are entries in have2 that don't exist in have1, and for such cases the COALESCE is needed.

If the left join is sufficient for you, you can also omit the COALESCE.

SQL COALESCE works for character and numeric variables.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 11-23-2020 12:28 AM

As a alternative approach, can we handle the calculation with 'calculated'
keyword?

Kurt_Bremser · Posted 11-23-2020 12:32 AM

@David_Billa wrote:
As a alternative approach, can we handle the calculation with 'calculated'
keyword?

If you have a different code idea, try it (Maxim 4).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 11-22-2020 01:55 PM

You could put them all in a single PROC SQL step, which in itself wouldn't reduce resources. But you would likely improve efficiency greatly if use use create view instead of create table for prem and hist. Just be sure to use the NOPRINT option on the proc sql statement - this allows proc sql to defer actual instantiation of the views until they are called for in the create table want statement:

proc sql noprint ;
   create view prem as .... ;
   create view hist as ... ;
   create table want as ...;
quit;

True, this doesn't reduce the amount of code, but it does avoid writing to disk (and re-reading from disk) of the prem and hist data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 11-22-2020 03:44 PM

@mkeintz Please explain further "this allows proc sql to defer actual instantiation of the views until they are called for". What do you mean by instantiation?

PG

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away