BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Count
Obsidian | Level 7

Hi,

 

Can anyone suggest reasons why my PROC SQL WHERE statement isn't recognising all variable values?

 

I'm using Base SAS v9.4.

 

I'm trying to create a new table from an old table using PROC SQL. I have a WHERE statement that requires the variable SLK_B to equal 1.38 in order to be included in the new table.

 

While I can see that this value is definitely present in the variable SLK_B in the old table (both visually and by doing a PROC FREQ), it does not seem to be being picked up when creating the new table. Some values work in the WHERE statement and return the accurate number of rows, and others (like 1.38) don't. I have checked the dataset and variable names are accurate.

 

SLK_B is a numeric variable, length = 8, format = BEST12., informat = 12.

 

My PROC SQL is as follows and returns 0 observations despite me being able to see that the value is present:

proc sql; 
create table new as
select *
from old
where slk_b = 1.38;
quit;

Thanks for your time!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This is probably due to rounding error. Your values are not exactly equal to 1.38. try something like

 

where slk_b between 1.3799 and 1.3801

 

or

 

where round(slk_b*100) = 138

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

This is probably due to rounding error. Your values are not exactly equal to 1.38. try something like

 

where slk_b between 1.3799 and 1.3801

 

or

 

where round(slk_b*100) = 138

PG
Count
Obsidian | Level 7
Ugh - of course! Thank you and apologies. I had assumed because of how I had created the variables that it could only possible be two decimal places, but the raw data set/original variables must not of been.
FreelanceReinh
Jade | Level 19

@Count wrote:
I had assumed because of how I had created the variables that it could only possible be two decimal places, ...

 

Hi @Count,

 

Numeric representation error is often the root cause of such problems.

 

Example:

data test;
v=1.38;                  /* (1) see explanations below */
w=6.9;                   /* (2) */
x=w/5;                   /* (3) */
r=round(w/5, 1e-9);      /* (4) */
put x best32.-l;         /* (5) */
put x 32.30;             /* (6) */
if x ne 1.38 then put "Surprised?";
put (x v r) (/ =hex16.); /* (7) */
d=x-v;                   /* (8) */
put / d=;                /* (9) */ 
run;

Result in the log (SAS 9.4M5 on Windows 7):

1.38
1.380000000000000000000000000000
Surprised?

x=3FF6147AE147AE15
v=3FF6147AE147AE14
r=3FF6147AE147AE14

d=2.220446E-16
  1. Variable V contains unavoidable numeric representation error because 1.38 has infinitely many digits in the binary system, whereas the internal (binary) representation in computers' memory has, of course, only finitely many bits.
  2. The same is true for variable W with its value 6.9 (=5*1.38, mathematically).
  3. Variable X contains an additional rounding error incurred by doing a calculation involving a value (W=6.9) contaminated with numeric representation error.
  4. The ROUND function, with a suitable rounding unit such as 1E-9, corrects that additional rounding error. (It cannot correct the remaining numeric representation error, but this is acceptable.) It is therefore very useful in dealing with numeric values with only a few decimal places.
  5. As you've noticed, the rounding error is invisible in many common display formats, which adds to the surprise.
  6. The enormous precision suggested by some formats is an illusion. In fact, these formats do some rounding. The value stored in variable X is (mathematically) 1.380000000000000115463... So, the zeros displayed above by format 32.30 starting at the 16th decimal are actually wrong and misleading.
  7. The HEX16. and BINARY64. formats reveal the true content of numeric variables, their internal representation. The difference between X and V in the last hexadecimal digit (and the last binary digit) is clearly visible as is the equality of V and the rounded value R. Note the repeating sequence "147AE" of hex digits (1.38 is a periodic fraction in the binary and hexadecimal number systems) and that hex digit 7 (<16/2) is correctly rounded down in V and R after the last digit (4), but not in X.
  8. Calculating the difference helps to see, even with the default format, why X and V are unequal.
  9. The difference 2.2204...E-16 = 2**-52 is the place value of the least significant bit of the internal representation (of both X and V).

 

Count
Obsidian | Level 7

Thank you @FreelanceReinh . I feel a bit sick that I was ignorant of this! For now, I've used the ROUND function to two decimals throughout my data sets and now I'm off to read more on the issue!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2067 views
  • 2 likes
  • 3 in conversation