BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Batman
Quartz | Level 8

I'm trying to sum totals across datasets using proc sql.   Is there away to avoid the third select statement (i.e. incorporating it in to the first two?)

 

data a;

input x 2.;

datalines;

10

20

;

data b;

input x 2.;

datalines;

25 ;

 

proc sql;

create table c as select sum(x) as tot

from a

union

select sum(x) as tot

from b;

 

select sum(tot)

from c;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Batman 

 

You can do this

Proc sql;
Create table c as 
Select sum(x) as tot
From (select x from a union all select x from b);
Quit;

it is important to specify « all » to avoid to remove duplicate records from input tables 

View solution in original post

6 REPLIES 6
yabwon
Onyx | Level 15

Hi,

 

how about subquery?

 

data a;
input x 2.;
datalines;
10
20
;
run;

data b;
input x 2.;
datalines;
25 
;
run;

 

proc sql;
select sum(x) as tot
from
(
  select x
  from a

  union all

  select x
  from b
)
;
quit;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ed_sas_member
Meteorite | Level 14

Hi @Batman 

 

You can do this

Proc sql;
Create table c as 
Select sum(x) as tot
From (select x from a union all select x from b);
Quit;

it is important to specify « all » to avoid to remove duplicate records from input tables 

novinosrin
Tourmaline | Level 20

Hi @Batman  I am afraid there is some glitch in either

 

 1. my comprehension of the requirement 

 2. Or the fix

 3. or the requirement itself

 

My understanding from what you wrote is 

1. You have 2 queries that involves a SET operator.

2. It appears the independent queries compute an independent sum of the the values of x. This would perfectly yield 1 record for each select clause i.e one for a A and two for B with two independent grand_totals

3. At this point, you merely want to compute the grand_total of the two grand_totals, or in other words sum of the 2 sums.

 

So the real and meaningful fix doesn't warrant an UNION ALL rather your existing query is indeed not wrong.  The fix is actually to get your two SELECT clauses that are combined using a SET operator in a subquery as mentioned by genius @yabwon for the reason the SQL processor needs a copy aka akin to FROM table/view to process the sum=>sum (of the sums). 

 

Of course doing one combined sum from an appended copy usng UNION ALL is essentially the right way albeit the slight tweak that is required  in your existing query to make it work after all is

select sum(x) as tot

from a

union

select sum(x) as tot

from b

/*to have the above as SUBQUERY below*/

proc sql;

create table c as 
select sum(tot) as g_tot
from
(select sum(x) as tot

from a

union

select sum(x) as tot

from b);
quit;

 I hope this clears the crack for others who perhaps weren't clear initially like me

yabwon
Onyx | Level 15

Hi @novinosrin ,

 

Your post got me thinking about performance (thanks 1e6 for the inspiration!)

 

So, what I did:

 

data a b c d;
  do _N_ = 1 to 1e8;
    x = 1;
    output;
  end; 
run;

options fullstimer msglevel=i;
resetline;
options ps=max ls=max;
proc sql feedback _method _tree;
select sum(tot) as tot
from
(
  select sum(x) as tot
  from a

  union all 

  select sum(x) as tot
  from b
)
;
quit;

proc sql feedback _method _tree;
select sum(x) as tot
from
(
  select x
  from c

  union all

  select x
  from d
)
;
quit;

Your idea of doing sum() in subquery ran half the time mine idea did:

71   data a b c d;
72     do _N_ = 1 to 1e8;
73       x = 1;
74       output;
75     end;
76   run;

NOTE: The data set WORK.A has 100000000 observations and 1 variables.
NOTE: The data set WORK.B has 100000000 observations and 1 variables.
NOTE: The data set WORK.C has 100000000 observations and 1 variables.
NOTE: The data set WORK.D has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           19.85 seconds
      cpu time            19.78 seconds

NOTE: Statement transforms to:

        select SUM(tot) as tot
          from ( select SUM(A.x) as tot
                   from WORK.A
               )
                 union all
               ( select SUM(B.x) as tot
                   from WORK.B
               ));


NOTE: SQL execution methods chosen are:

      sqxslct
          sqxsumn
              sqxuall
                  sqxsumn
                      sqxsrc( WORK.A )
                  sqxsumn
                      sqxsrc( WORK.B )

NOTE: PROCEDURE SQL used (Total process time):
      real time           15.98 seconds
      user cpu time       14.34 seconds
      system cpu time     1.60 seconds
      memory              5649.09k
      OS Memory           27384.00k


NOTE: Statement transforms to:

        select SUM(x) as tot
          from ( select C.x
                   from WORK.C
               )
                 union all
               ( select D.x
                   from WORK.D
               ));


NOTE: SQL execution methods chosen are:

      sqxslct
          sqxsumn
              sqxuall
                  sqxsrc( WORK.C )
                  sqxsrc( WORK.D )

NOTE: PROCEDURE SQL used (Total process time):
      real time           31.27 seconds
      user cpu time       29.82 seconds
      system cpu time     1.54 seconds
      memory              5622.53k
      OS Memory           27384.00k

 

But I think we need `union all` in any case. There is one data setup which cause it:

 

data a;
input x 2.;
datalines;
10
20
;
run;

data b;
input x 2.;
datalines;
25
5 
;
run;

proc sql;
select sum(x) as tot
from a

union /* ALL */

select sum(x) as tot
from b
;
run;

Without `all` we will get 30, but we would like to have 60.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



novinosrin
Tourmaline | Level 20

Good morning Cop @yabwon  Great catch. I didn't think of identical sum that caused the elimination of dup. Yes, UNION ALL will be required in cases where have sum(of identical sums).  Indeed union all is essentially the right way of doing it. I agree.

 

I can't thank you enough for your time and details. Much appreciate it bro!

 

Okay, that being said one funny workaround(turn on your sense of humor) with a dsn label making it unique Smiley Wink

 

data a;
input x 2.;
datalines;
10
20
;
run;

data b;
input x 2.;
datalines;
25
5 
;
run;

proc sql;
select sum(tot) as grand_tot
from
(select 'a' as dsn,sum(x) as tot
from a

union 

select 'b' as dsn, sum(x) as tot
from b)
;
quit;
grand_tot
60
yabwon
Onyx | Level 15
lovely 😄 😄
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1307 views
  • 3 likes
  • 4 in conversation