BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Batman
Quartz | Level 8

I'm learning about combining datasets vertically using SQL.   I'm trying to understand why the output includes a row where X=2 but no row where X=3.   Anyone have any idea?

 

data one;

input X 1. A $1.;

datalines;

 

1a

1a

1b

2c

3v

4e

6g

;

data two;

input X 1. B $1.;

datalines;

 

1x

2y

3z

3v

5w

;

run;

proc sql;

select *

from one

 

except

select *

from two;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

SQL is a set operation language.  In general a "SET" of values does not include duplicate values. So it makes no sense to have two exact duplicates be the result of such a set operation.

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Hello @Batman 

 

You are basically asking the SQL processor to exclude records that match two and only output the nonmatches from one

 

Also, since you didn't use ALL i.e except all, you are telling SQL to do a double pass to exclude duplicate records in the nonmatches from one.

 

HTH

 

In essence, this is what you are doing

 

Set Difference
Difference between sets is denoted by ‘A – B’, is the set containing elements of set A but not in B. i.e all elements of A except the element of B.

A-B

Above is the Venn Diagram of A-B.

ballardw
Super User

Since your only common variable between the data sets is X that is the only variable considered for the comparison. The variable A will not be compared to B for determining inclusion/exclusion with the Except operation.

 

Name B to A in the set Two and see what happens.

 

BTW, it is not a good idea to start your datalines with a blank unless you really truly actually want to have a row of missing values.

 

PGStats
Opal | Level 21

Contrary to what is stated above (sorry @ballardw), SQL set operations without the CORR keyword operate on columns by positions, not by name. So the comparison involves column A against column B. The names are taken from the first table involved. So 2c is kept because it is different from 2y but 3v is removed because it is present in table two.

 

Try EXCEPT CORR and see what happens.

PG
ballardw
Super User

@PGStats wrote:

Contrary to what is stated above (sorry @ballardw), SQL set operations without the CORR keyword operate on columns by positions, not by name. So the comparison involves column A against column B. The names are taken from the first table involved. So 2c is kept because it is different from 2y but 3v is removed because it is present in table two.

 

Try EXCEPT CORR and see what happens.


I sit corrected. Thank you.

 

 

Tom
Super User Tom
Super User

Because '2c' is in one and not in two, but the only value with X=3 in one is '3v' and that value is in two so it is deleted by EXCEPT.

Batman
Quartz | Level 8

Ok, that makes sense, why is the additional value of "1a" also dropped.

Tom
Super User Tom
Super User

SQL is a set operation language.  In general a "SET" of values does not include duplicate values. So it makes no sense to have two exact duplicates be the result of such a set operation.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 809 views
  • 0 likes
  • 5 in conversation