Re: Wacky results creating decimal values

SAShole1 · Posted 07-14-2022 03:38 PM

Can anyone explain this. 7.6 is changing to 7.599999994 for y and z, but 7.5 translates fine. I'm assuming it has something

to do with the default format of best12, but it doesn't make sense to me that it would change the value like that. If you use a put statement, it looks fine. You have to open the dataset to see it. Proc print shows yet another value.

data test;
length t1 t2 t3 5.;
x="7.5";
y="7.6";
z=7.6;

t1=x;
t2=y;
t3=z;

put _all_;
run;

proc print; run;

ballardw · Posted 07-14-2022 03:48 PM

Automatic conversions from text to numeric, i.e. where you take no actions to control behavior, can use all sorts of implied rules. You also run into issues of exact storage of base10 numbers using base2 (binary) storage and precision of used storage. You have complicated this by limited storage to 5 bytes instead of the default 8 which is likely having more impact than the implied format.

Consider:

data test;
length t1 t2  5 t3 8;
y="7.6";
t1=y;
t2=Input(y,3.);
t3=y;

format t1 - t3 16.14;
run;

You will likely see that t1 and t2 show your value. But with t3 and a length of 8 more of a 7.6.

SAShole1 · Posted 07-14-2022 03:59 PM

In my example, it also happened with numeric to numeric. So it takes more memory to store 7.6 vs 7.5? SAS converts it into a value with more digits but the value is less and it takes less space to store that value because, base10 is stored as base2. Is that what you're saying?

FreelanceReinh · Posted 07-14-2022 05:20 PM

@SAShole1 wrote:
So it takes more memory to store 7.6 vs 7.5?

Indeed:

Written in the binary system, t1=7.5 and t2=7.6 look like this:

t1=111.1
t2=111.10011001100110011001100110011001100110011001100110011001100110011001100110011001...

where t2, unlike t1, is a periodic fraction, repeating the 4-digit pattern "1001" infinitely often. With the default length of 8 bytes (=64 bits) for numeric variables (under Windows or Unix) 53 of the infinitely many bits can be stored. (See Numerical Accuracy in SAS Software for details.) Everything after the 53rd digit (green and underlined above) is rounded off, causing an unavoidable rounding error ("numeric representation error") of about 3.55E-16 in this example. This means, even with maximum precision (8 bytes) the stored number is actually not 7.6, but 7.5999999999999996447286321199499070644378662109375.

Since you defined a length of only 5 bytes (=40 bits), 24 more binary digits are rounded off -- now the last bit stored is already that highlighted in red above --, thus increasing the rounding error by a factor of about 16.8 million (!) to 5.96E-9. Now the stored number is really 7.5999999940395355224609375 and the deviation from 7.6 is now large enough to be visible even with common formats such as BEST12. Even worse, it's also large enough to let an IF condition like t2=7.6 fail!

120   data have;
121   length t2 5;
122   t2=7.6;
123   run;

NOTE: The data set WORK.HAVE has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.12 seconds
      cpu time            0.12 seconds


124
125   data dontwant;
126   set have;
127   if t2 ne 7.6 then put 'Surprised?';
128   put t2 binary64.;
129   x=7.6;
130   put x binary64.;
131   run;

Surprised?
0100000000011110011001100110011001100110000000000000000000000000
0100000000011110011001100110011001100110011001100110011001100110
NOTE: There were 1 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.DONTWANT has 1 observations and 2 variables.

So, better use a length of 8 bytes for non-integer numeric values in order to keep rounding errors to a minimum. Only a small minority of non-integer numbers (such as 7.5) can be stored in variables of length <8 without incurring rounding errors (and certain risks remain even for those numbers).

SAShole1 · Posted 07-14-2022 04:53 PM

Oh and thank you by the way 🙂 I'll admit my eyes tend to glaze over in discussions about storage math.

Tom · Posted 07-14-2022 08:26 PM

The issue is this statement.

length t1 t2 t3 5.;

There are two mistake there.

First there is no need to include a period in the length. Variable lengths can only be specified in integers so they do not need decimal points.

Second is that you almost never want to store numeric variables with less than the full 8 bytes needed to store the 64 bit floating point values SAS uses for numbers.

Perhaps the first error lead to the second? Perhaps you meant to say you wanted the values to PRINT with 5 characters? In that case you would want to use a FORMAT statement. In a FORMAT statement you do need the period because that is how in SAS syntax you distinguish a format specification from a variable name.

Since numbers are stored using binary representation values like 7.6 cannot be exactly represent. Because you told it to throw away the last 3 bytes (24 bits) of precision needed to represent the number the values read back in from the stored dataset (which I assume is what that lasts photograph is trying to show) will not match the original used in the first data step.

Wacky results creating decimal values