Solved: Re: Why is my PROC SQL WHERE statement not recognising all variable va...

Count · Posted 07-02-2019 11:07 PM

Hi,

Can anyone suggest reasons why my PROC SQL WHERE statement isn't recognising all variable values?

I'm using Base SAS v9.4.

I'm trying to create a new table from an old table using PROC SQL. I have a WHERE statement that requires the variable SLK_B to equal 1.38 in order to be included in the new table.

While I can see that this value is definitely present in the variable SLK_B in the old table (both visually and by doing a PROC FREQ), it does not seem to be being picked up when creating the new table. Some values work in the WHERE statement and return the accurate number of rows, and others (like 1.38) don't. I have checked the dataset and variable names are accurate.

SLK_B is a numeric variable, length = 8, format = BEST12., informat = 12.

My PROC SQL is as follows and returns 0 observations despite me being able to see that the value is present:

proc sql; 
create table new as
select *
from old
where slk_b = 1.38;
quit;

Thanks for your time!

PGStats · Posted 07-02-2019 11:14 PM

This is probably due to rounding error. Your values are not exactly equal to 1.38. try something like

where slk_b between 1.3799 and 1.3801

or

where round(slk_b*100) = 138

PG

View solution in original post

PGStats · Posted 07-02-2019 11:14 PM

This is probably due to rounding error. Your values are not exactly equal to 1.38. try something like

where slk_b between 1.3799 and 1.3801

or

where round(slk_b*100) = 138

PG

Count · Posted 07-02-2019 11:20 PM

Ugh - of course! Thank you and apologies. I had assumed because of how I had created the variables that it could only possible be two decimal places, but the raw data set/original variables must not of been.

FreelanceReinh · Posted 07-03-2019 06:11 AM

@Count wrote:
I had assumed because of how I had created the variables that it could only possible be two decimal places, ...

Hi @Count,

Numeric representation error is often the root cause of such problems.

Example:

data test;
v=1.38;                  /* (1) see explanations below */
w=6.9;                   /* (2) */
x=w/5;                   /* (3) */
r=round(w/5, 1e-9);      /* (4) */
put x best32.-l;         /* (5) */
put x 32.30;             /* (6) */
if x ne 1.38 then put "Surprised?";
put (x v r) (/ =hex16.); /* (7) */
d=x-v;                   /* (8) */
put / d=;                /* (9) */ 
run;

Result in the log (SAS 9.4M5 on Windows 7):

1.38
1.380000000000000000000000000000
Surprised?

x=3FF6147AE147AE15
v=3FF6147AE147AE14
r=3FF6147AE147AE14

d=2.220446E-16

Variable V contains unavoidable numeric representation error because 1.38 has infinitely many digits in the binary system, whereas the internal (binary) representation in computers' memory has, of course, only finitely many bits.
The same is true for variable W with its value 6.9 (=5*1.38, mathematically).
Variable X contains an additional rounding error incurred by doing a calculation involving a value (W=6.9) contaminated with numeric representation error.
The ROUND function, with a suitable rounding unit such as 1E-9, corrects that additional rounding error. (It cannot correct the remaining numeric representation error, but this is acceptable.) It is therefore very useful in dealing with numeric values with only a few decimal places.
As you've noticed, the rounding error is invisible in many common display formats, which adds to the surprise.
The enormous precision suggested by some formats is an illusion. In fact, these formats do some rounding. The value stored in variable X is (mathematically) 1.380000000000000115463... So, the zeros displayed above by format 32.30 starting at the 16th decimal are actually wrong and misleading.
The HEX16. and BINARY64. formats reveal the true content of numeric variables, their internal representation. The difference between X and V in the last hexadecimal digit (and the last binary digit) is clearly visible as is the equality of V and the rounded value R. Note the repeating sequence "147AE" of hex digits (1.38 is a periodic fraction in the binary and hexadecimal number systems) and that hex digit 7 (<16/2) is correctly rounded down in V and R after the last digit (4), but not in X.
Calculating the difference helps to see, even with the default format, why X and V are unequal.
The difference 2.2204...E-16 = 2**-52 is the place value of the least significant bit of the internal representation (of both X and V).

Count · Posted 07-03-2019 07:26 PM

Thank you @FreelanceReinh . I feel a bit sick that I was ignorant of this! For now, I've used the ROUND function to two decimals throughout my data sets and now I'm off to read more on the issue!

Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

Re: Why is my PROC SQL WHERE statement not recognising all variable values?

SAS Innovate 2025: Call for Content