BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Batman
Quartz | Level 8

Why are the totals from this query join too high?  The results should be 5, 3 and 2.

 

data a;
input x 1.;
datalines;
1
2
1
1
;

data b;
input x 1. y 1.;
datalines;
11
12
13
21
;

 

/*sum values of x for both data sets and calculate the difference*/

proc sql;
select sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff
from a, b
where b.y > 1;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.

The Cartesian join is the reason why. That is the response to your question.

 

You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.

A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on

One way:

proc sql;
   create table want as
   select asum,bsum, asum-bsum as dif
   from (select sum(x) as asum from a),
        (select sum(x) as bsum from b
         where y>1)
   ;
quit;

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

I suggest that you run this code and look at the result:

proc sql;
create table work.sum as
select a.x as ax, b.x as bx, b.y, sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff
from a, b
where b.y > 1;
quit;

Then look up what "cartesian join" means.

I think you expected to get a row by row match of some sort but that is not what happens with

 

from a, b which generates a cartesian join.

Your log should show something like this:

NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.

as an indication of what is going on.

Batman
Quartz | Level 8
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.
ballardw
Super User

@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.

The Cartesian join is the reason why. That is the response to your question.

 

You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.

A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on

One way:

proc sql;
   create table want as
   select asum,bsum, asum-bsum as dif
   from (select sum(x) as asum from a),
        (select sum(x) as bsum from b
         where y>1)
   ;
quit;

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1668 views
  • 0 likes
  • 2 in conversation