BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7

Hi ALL,

 

I have a proc sql step which looks like below.

 

proc sql;
create table test as
select compress(key) as key, total_amt , a_cd, b_cd
  from table1
except
select compress(key) as key, total_amt , a_cd, b_cd
  from table2;
quit;

 

I have two tables table1 and table2. I am getting weird result.

 

table1
key                       total_amt    a_cd   b_cd
9528553591287      209.20
9707589975114         0.00  
  

 

 

 

table2
key                      total_amt    a_cd   b_cd
9528553591287     209.20
9707589975114         0.00  


test
key                       total_amt    a_cd   b_cd
9528553591287       209.20 

 

 

I believe I should have got 0 records with the above query but I am getting some rows although they have same values in table 1 and 2. The formats/lengths of the columns are different in two tables but I don't believe that is the cause of the issue.

 

table1:

total_amt        format: 20.2     

                       informat 20.2 

                       numeric

                       length   8

a_cd              format: $1     

                       informat $1

b_cd                format: $4     

                       informat $4

 

table2:

total_amt        format: 21.2     

                       informat 21.2 

                       numeric

                       length   8

a_cd              format: $1     

                       informat $1

b_cd                format: $10     

                       informat $10

 

Any idea why ,although the data values are same in both tables, I am getting results with the except operator?

 

Thanks in advance

Nick

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @nickspencer,

 

I support @ballardw's ROUND function approach. Note that there are 35 (!) different numeric values (i.e. internal binary representations) whose formatted value even in BEST32. format looks like 209.2. Deviations from the standard representation (which in this example is 406A266666666666 in HEX16. format) can occur easily when calculations are performed with non-integer values or data from different platforms or databases are imported.

 

Here are four examples of the 35 values mentioned above:

data numrep;
x0=209.20;
x1=208.80+0.40;
x2=337.28-128.08;
x3=273.22-64.02;
put (x:) (=hex16./);
run;

Result (under Windows):

x0=406A266666666666
x1=406A266666666667
x2=406A266666666665
x3=406A266666666668

Needless to say, all four numbers should be exactly 209.20 from a mathematical point of view. But the tiny rounding errors due to numeric representation issues make them different for PROC SQL's EXCEPT operator (and any other SAS operator relying on exact internal equality, e.g., the condition 208.8+0.4 ne 209.2 is true!). The fact that common formats like 20.12 or BEST32. do not reveal the relevant decimal digits adds to the confusion. Use the HEX16. format to see the true internal values of your numeric variables and how they differ.

 

The ROUND function could also be used to clean the data so that numeric values have their expected standard internal representation. This would reduce the risk of getting such "weird results" in the future.

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20

I tested your code and it works as expected with zero records in results

 


data one;
infile cards missover;
input key : $30.                       total_amt    a_cd   b_cd;
cards;
9528553591287      209.20
9707589975114         0.00  
;

data two;
infile cards missover;
input key : $30.                       total_amt    a_cd   b_cd;
cards;
9528553591287     209.20
9707589975114         0.00  
;
proc sql;
create table test as
select compress(key) as key, total_amt , a_cd, b_cd
  from one
except
select compress(key) as key, total_amt , a_cd, b_cd
  from two;
quit;
ballardw
Super User

Print both of your table Total_amt variables with a wider format like 20.12.

 

The display formats, both set to 2 decimal places could well be rounding off values that appear the same but have actual decimal differences.

 

Instead of relying on the format you might consider:

 

proc sql;
   create table test as
   select compress(key) as key, Round(total_amt,.01) as total_amt , a_cd, b_cd
     from table1
   except
   select compress(key) as key, Round(total_amt,.01) as total_amt, a_cd, b_cd
     from table2;
quit;
FreelanceReinh
Jade | Level 19

Hi @nickspencer,

 

I support @ballardw's ROUND function approach. Note that there are 35 (!) different numeric values (i.e. internal binary representations) whose formatted value even in BEST32. format looks like 209.2. Deviations from the standard representation (which in this example is 406A266666666666 in HEX16. format) can occur easily when calculations are performed with non-integer values or data from different platforms or databases are imported.

 

Here are four examples of the 35 values mentioned above:

data numrep;
x0=209.20;
x1=208.80+0.40;
x2=337.28-128.08;
x3=273.22-64.02;
put (x:) (=hex16./);
run;

Result (under Windows):

x0=406A266666666666
x1=406A266666666667
x2=406A266666666665
x3=406A266666666668

Needless to say, all four numbers should be exactly 209.20 from a mathematical point of view. But the tiny rounding errors due to numeric representation issues make them different for PROC SQL's EXCEPT operator (and any other SAS operator relying on exact internal equality, e.g., the condition 208.8+0.4 ne 209.2 is true!). The fact that common formats like 20.12 or BEST32. do not reveal the relevant decimal digits adds to the confusion. Use the HEX16. format to see the true internal values of your numeric variables and how they differ.

 

The ROUND function could also be used to clean the data so that numeric values have their expected standard internal representation. This would reduce the risk of getting such "weird results" in the future.

nickspencer
Obsidian | Level 7
@FreelanceReinhard & @ballardw This makes sense. One of the value is coming from a teradata table and has decimal(18,2) format. I like your idea of rounding off. Does it possess any risk with except operator and comparing with existing dataset value? Or converting it to character using cats() function while using the except operator should work ? Which solution will be more efficient ?
Kurt_Bremser
Super User

When doing comparisons with numbers in SAS, using the round() function is almost imperative, as you might find that 200.40 is smaller or greater than 200.40 😉

There is no danger in using this function, especially not if you use a cutoff value according to the significant digits (in your case .01)

.01 will take care of the artifacts while keeping everything important intact.

FreelanceReinh
Jade | Level 19

For numeric values which are potentially used in calculations (and hence are "truly" numeric, unlike numeric IDs etc.) I prefer rounding to conversion to character. When the goal is just "data cleaning," i.e., getting rid of artifacts originating in numeric representation issues, I often use small rounding units such as 1E-8 as the second argument of the ROUND function. If the data are known to have at most two decimal places, rounding unit .01 will have the same effect. The larger rounding unit (.01) is also preferable if very large values like 100 millions and beyond are to be rounded because numeric representation error grows with the order of (absolute) magnitude of the numeric values.

 

Example:

data _null_;
x=123456789.01;
y=123456789.02;
if round(x+0.01, 1e-7)=y then put 'OK';
d=x+0.01-y;
put d best12.;
run;

Result (under Windows):

OK
1.4901161E-8

With rounding unit 1E-8 the "OK" would not occur because the rounding error to be removed is greater than that (see variable d), but anything between 1E-7 and 1E-2 works.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2070 views
  • 3 likes
  • 5 in conversation