Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Issue with except operator in proc sql

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-18-2019 05:48 PM
(2480 views)

Hi ALL,

I have a proc sql step which looks like below.

proc sql;

create table test as

select compress(key) as key, total_amt , a_cd, b_cd

from table1

except

select compress(key) as key, total_amt , a_cd, b_cd

from table2;

quit;

I have two tables table1 and table2. I am getting weird result.

table1

key total_amt a_cd b_cd

9528553591287 209.20

9707589975114 0.00

table2

key total_amt a_cd b_cd

9528553591287 209.20

9707589975114 0.00

test

key total_amt a_cd b_cd

9528553591287 209.20

I believe I should have got 0 records with the above query but I am getting some rows although they have same values in table 1 and 2. The formats/lengths of the columns are different in two tables but I don't believe that is the cause of the issue.

table1:

total_amt format: 20.2

informat 20.2

numeric

length 8

a_cd format: $1

informat $1

b_cd format: $4

informat $4

table2:

total_amt format: 21.2

informat 21.2

numeric

length 8

a_cd format: $1

informat $1

b_cd format: $10

informat $10

Any idea why ,although the data values are same in both tables, I am getting results with the except operator?

Thanks in advance

Nick

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @nickspencer,

I support @ballardw's ROUND function approach. Note that there are 35 (!) different numeric values (i.e. internal binary representations) whose formatted value even in BEST*32.* format looks like 209.2. Deviations from the standard representation (which in this example is 406A266666666666 in HEX16. format) can occur easily when calculations are performed with non-integer values or data from different platforms or databases are imported.

Here are four examples of the 35 values mentioned above:

```
data numrep;
x0=209.20;
x1=208.80+0.40;
x2=337.28-128.08;
x3=273.22-64.02;
put (x:) (=hex16./);
run;
```

Result (under Windows):

x0=406A266666666666x1=406A266666666667x2=406A266666666665x3=406A266666666668

Needless to say, all four numbers should be exactly 209.20 from a mathematical point of view. But the tiny rounding errors due to numeric representation issues make them different for PROC SQL's EXCEPT operator (and any other SAS operator relying on exact internal equality, e.g., the condition 208.8+0.4 **ne** 209.2 is *true*!). The fact that common formats like 20.12 or BEST32. do not reveal the relevant decimal digits adds to the confusion. Use the *HEX16. format* to see the true internal values of your numeric variables and how they differ.

The ROUND function could also be used to clean the data so that numeric values have their expected standard internal representation. This would reduce the risk of getting such "weird results" in the future.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I tested your code and it works as expected with zero records in results

```
data one;
infile cards missover;
input key : $30. total_amt a_cd b_cd;
cards;
9528553591287 209.20
9707589975114 0.00
;
data two;
infile cards missover;
input key : $30. total_amt a_cd b_cd;
cards;
9528553591287 209.20
9707589975114 0.00
;
proc sql;
create table test as
select compress(key) as key, total_amt , a_cd, b_cd
from one
except
select compress(key) as key, total_amt , a_cd, b_cd
from two;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Print both of your table Total_amt variables with a wider format like 20.12.

The display formats, both set to 2 decimal places could well be rounding off values that appear the same but have actual decimal differences.

Instead of relying on the format you might consider:

proc sql; create table test as select compress(key) as key, Round(total_amt,.01) as total_amt , a_cd, b_cd from table1 except select compress(key) as key, Round(total_amt,.01) as total_amt, a_cd, b_cd from table2; quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @nickspencer,

I support @ballardw's ROUND function approach. Note that there are 35 (!) different numeric values (i.e. internal binary representations) whose formatted value even in BEST*32.* format looks like 209.2. Deviations from the standard representation (which in this example is 406A266666666666 in HEX16. format) can occur easily when calculations are performed with non-integer values or data from different platforms or databases are imported.

Here are four examples of the 35 values mentioned above:

```
data numrep;
x0=209.20;
x1=208.80+0.40;
x2=337.28-128.08;
x3=273.22-64.02;
put (x:) (=hex16./);
run;
```

Result (under Windows):

x0=406A266666666666x1=406A266666666667x2=406A266666666665x3=406A266666666668

Needless to say, all four numbers should be exactly 209.20 from a mathematical point of view. But the tiny rounding errors due to numeric representation issues make them different for PROC SQL's EXCEPT operator (and any other SAS operator relying on exact internal equality, e.g., the condition 208.8+0.4 **ne** 209.2 is *true*!). The fact that common formats like 20.12 or BEST32. do not reveal the relevant decimal digits adds to the confusion. Use the *HEX16. format* to see the true internal values of your numeric variables and how they differ.

The ROUND function could also be used to clean the data so that numeric values have their expected standard internal representation. This would reduce the risk of getting such "weird results" in the future.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@FreelanceReinhard & @ballardw This makes sense. One of the value is coming from a teradata table and has decimal(18,2) format. I like your idea of rounding off. Does it possess any risk with except operator and comparing with existing dataset value? Or converting it to character using cats() function while using the except operator should work ? Which solution will be more efficient ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

When doing comparisons with numbers in SAS, using the round() function is almost imperative, as you might find that 200.40 is smaller or greater than 200.40 😉

There is no danger in using this function, especially not if you use a cutoff value according to the significant digits (in your case .01)

.01 will take care of the artifacts while keeping everything important intact.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For numeric values which are potentially used in calculations (and hence are "truly" numeric, unlike numeric IDs etc.) I prefer rounding to conversion to character. When the goal is just "data cleaning," i.e., getting rid of artifacts originating in numeric representation issues, I often use small rounding units such as 1E-8 as the second argument of the ROUND function. If the data are known to have at most two decimal places, rounding unit .01 will have the same effect. The larger rounding unit (.01) is also preferable if very large values like 100 millions and beyond are to be rounded because numeric representation error grows with the order of (absolute) magnitude of the numeric values.

Example:

```
data _null_;
x=123456789.01;
y=123456789.02;
if round(x+0.01, 1e-7)=y then put 'OK';
d=x+0.01-y;
put d best12.;
run;
```

Result (under Windows):

OK 1.4901161E-8

With rounding unit 1E-8 the "OK" would not occur because the rounding error to be removed is greater than that (see variable d), but anything between 1E-7 and 1E-2 works.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.