BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Batman
Quartz | Level 8

Why are the totals from this query join too high?  The results should be 5, 3 and 2.

 

data a;
input x 1.;
datalines;
1
2
1
1
;

data b;
input x 1. y 1.;
datalines;
11
12
13
21
;

 

/*sum values of x for both data sets and calculate the difference*/

proc sql;
select sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff
from a, b
where b.y > 1;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.

The Cartesian join is the reason why. That is the response to your question.

 

You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.

A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on

One way:

proc sql;
   create table want as
   select asum,bsum, asum-bsum as dif
   from (select sum(x) as asum from a),
        (select sum(x) as bsum from b
         where y>1)
   ;
quit;

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

I suggest that you run this code and look at the result:

proc sql;
create table work.sum as
select a.x as ax, b.x as bx, b.y, sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff
from a, b
where b.y > 1;
quit;

Then look up what "cartesian join" means.

I think you expected to get a row by row match of some sort but that is not what happens with

 

from a, b which generates a cartesian join.

Your log should show something like this:

NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.

as an indication of what is going on.

Batman
Quartz | Level 8
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.
ballardw
Super User

@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.

The Cartesian join is the reason why. That is the response to your question.

 

You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.

A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on

One way:

proc sql;
   create table want as
   select asum,bsum, asum-bsum as dif
   from (select sum(x) as asum from a),
        (select sum(x) as bsum from b
         where y>1)
   ;
quit;

 

 

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2043 views
  • 0 likes
  • 2 in conversation