Why are the totals from this query join too high? The results should be 5, 3 and 2.
data a;
input x 1.;
datalines;
1
2
1
1
;
data b;
input x 1. y 1.;
datalines;
11
12
13
21
;
/*sum values of x for both data sets and calculate the difference*/
proc sql;
select sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff
from a, b
where b.y > 1;
quit;
@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.
The Cartesian join is the reason why. That is the response to your question.
You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.
A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on
One way:
proc sql; create table want as select asum,bsum, asum-bsum as dif from (select sum(x) as asum from a), (select sum(x) as bsum from b where y>1) ; quit;
I suggest that you run this code and look at the result:
proc sql; create table work.sum as select a.x as ax, b.x as bx, b.y, sum(a.x) as atot, sum(b.x) as btot, (sum(a.x) - sum(b.x)) as diff from a, b where b.y > 1; quit;
Then look up what "cartesian join" means.
I think you expected to get a row by row match of some sort but that is not what happens with
from a, b which generates a cartesian join.
Your log should show something like this:
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
as an indication of what is going on.
@Batman wrote:
Sorry, I should have been clearer. I just want to compare the totals for x for both datasets with the where filter for the values in data set b.
The Cartesian join is the reason why. That is the response to your question.
You need to create the SUM before joining the data and need to avoid creating a Cartesian join for your results.
A one-to-one record merge is quite often the place to use a data step merge. In SQL you would need to provide either 1) exactly one record in each subset or 2) a field in common to join on
One way:
proc sql; create table want as select asum,bsum, asum-bsum as dif from (select sum(x) as asum from a), (select sum(x) as bsum from b where y>1) ; quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.