Solved: Datetime creating phantom decimals

minfante · Posted 03-02-2016 04:54 PM

Hi,

In base SAS 9.3, when combining date and time variables using dhms, the result creates phantom decimal places at the tail end of the datetime value. For example:

Time = 9:58:56.733220000 (format time18.9)

Date = 16APR2015

Datetime = dhms(date,0,0,time)

outputted datetime: 16APR2015:09:58:56.733220100 (format datetime28.9)

Interestingly, if I format Datetime variable as a number (e.g. 28.9), I don’t get the rounding issue: 1744797536.733220000. So I think it must be something to do with how SAS converts numbers to datetime.

Any suggestions are welcome. Thanks in advance!

Matt

FreelanceReinh · Posted 03-02-2016 07:40 PM

Hi Matt,

You have encountered a numeric representation issue. This is not specific to datetime values, but a (well understood) phenomenon with numeric values which are stored as binary floating-point representations (but displayed in the decimal system).

The conversion from SAS date and time to datetime values is straightforward: datetime=date*86400+time. Therefore, it is not surprising that performing this calculation with the numbers representing date and time in your example ('16APR2015'd=20194, '9:58:56.73322't=35936.73322) yields exactly the same result as obtained using the DHMS function (all calculations were done with SAS 9.4 on Windows):

data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
dt=20194*86400+35936.73322;
put datetime hex16.;
put dt hex16.;
run;
/* Result:
41D9FFDFD82EED14
41D9FFDFD82EED14
*/

I've used the HEX16. format to display the values, because the usual numeric formats (such as the w.d format, e.g. 28.9) do some rounding (!), so in general they don't show the number exactly that is stored internally.

If you calculate the second of the datetime value, you will get the same "phantom decimals" as are displayed in format datetime28.9. (Below I use datetime32.13 to show even more decimals.) For demonstration, I've calculated that value also from the time value and entered it directly.

data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
sec=second(datetime);
s=1744797536.73322-1744797480;
st=second(time);
stn=35936.73322-35880;
sx=56.73322;
put datetime datetime32.13;
put (sec s st stn sx) (16.13 /);
put (sec s st stn sx) (hex16. /);
run;
/* Result:
16APR2015:09:58:56.7332201004028
56.7332201004028
56.7332201004028
56.7332200000019
56.7332200000019
56.7332200000000
404C5DDA28000000
404C5DDA28000000
404C5DDA27286400
404C5DDA27286400
404C5DDA272862F6
*/

As you can see, the results from purely numeric calculation are identical to those obtained by using the SECOND function. However, there are three (slightly) different values for the seconds, depending on whether they were derived from the datetime value, derived from the time value or entered directly.

Reason: The orders of magnitude of 1744797536.73322, 35936.73322 and 56.73322 are very different. Therefore, the precision of these values varies considerably (i.e., the value of the least significant bit in the internal 64-bit floating-point representation differs by several powers of 2 between the three numbers). The more bits are required to store the integer part of the number, the fewer bits are available to store the fractional part. And the fractional part is the difficult one! The (mathematically) exact binary representation of 0.73322 has infinitely many (binary) digits. Hence, there is always a discrepancy between the internal value and the exact decimal value. For the large datetime value this numeric representation error is greater than for the time value, for which in turn it is greater than for the comparably small seconds value (56.73322).

I'll be happy to add more details tomorrow (CET).

View solution in original post

FreelanceReinh · Posted 03-02-2016 07:40 PM

Hi Matt,

You have encountered a numeric representation issue. This is not specific to datetime values, but a (well understood) phenomenon with numeric values which are stored as binary floating-point representations (but displayed in the decimal system).

The conversion from SAS date and time to datetime values is straightforward: datetime=date*86400+time. Therefore, it is not surprising that performing this calculation with the numbers representing date and time in your example ('16APR2015'd=20194, '9:58:56.73322't=35936.73322) yields exactly the same result as obtained using the DHMS function (all calculations were done with SAS 9.4 on Windows):

data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
dt=20194*86400+35936.73322;
put datetime hex16.;
put dt hex16.;
run;
/* Result:
41D9FFDFD82EED14
41D9FFDFD82EED14
*/

I've used the HEX16. format to display the values, because the usual numeric formats (such as the w.d format, e.g. 28.9) do some rounding (!), so in general they don't show the number exactly that is stored internally.

If you calculate the second of the datetime value, you will get the same "phantom decimals" as are displayed in format datetime28.9. (Below I use datetime32.13 to show even more decimals.) For demonstration, I've calculated that value also from the time value and entered it directly.

data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
sec=second(datetime);
s=1744797536.73322-1744797480;
st=second(time);
stn=35936.73322-35880;
sx=56.73322;
put datetime datetime32.13;
put (sec s st stn sx) (16.13 /);
put (sec s st stn sx) (hex16. /);
run;
/* Result:
16APR2015:09:58:56.7332201004028
56.7332201004028
56.7332201004028
56.7332200000019
56.7332200000019
56.7332200000000
404C5DDA28000000
404C5DDA28000000
404C5DDA27286400
404C5DDA27286400
404C5DDA272862F6
*/

As you can see, the results from purely numeric calculation are identical to those obtained by using the SECOND function. However, there are three (slightly) different values for the seconds, depending on whether they were derived from the datetime value, derived from the time value or entered directly.

Reason: The orders of magnitude of 1744797536.73322, 35936.73322 and 56.73322 are very different. Therefore, the precision of these values varies considerably (i.e., the value of the least significant bit in the internal 64-bit floating-point representation differs by several powers of 2 between the three numbers). The more bits are required to store the integer part of the number, the fewer bits are available to store the fractional part. And the fractional part is the difficult one! The (mathematically) exact binary representation of 0.73322 has infinitely many (binary) digits. Hence, there is always a discrepancy between the internal value and the exact decimal value. For the large datetime value this numeric representation error is greater than for the time value, for which in turn it is greater than for the comparably small seconds value (56.73322).

I'll be happy to add more details tomorrow (CET).

FreelanceReinh · Posted 03-03-2016 01:53 PM

Hi Matt,

As promised, please find below more details on your numeric representation issue:

Let's compare four numbers with decimal places in the binary system and their 64-bit binary floating-point representations, as which SAS stores them internally on Windows and Unix systems, and check what these numbers look like when converted back to the decimal system.

First, a quick look at the internal representations (again, revealed by the HEX16. format):

data _null_;
X1 =          0.73322; /* the fractional part of your datetime value */
X2 =         56.73322; /* the seconds of your datetime value */
X3 =      35936.73322; /* the time part of your datetime value */
X4 = 1744797536.73322; /* your datetime value */
put (X:) (= hex16. /);
run;
/* Result:
X1=3FE77689CA18BD66
X2=404C5DDA272862F6
X3=40E18C177689CA19
X4=41D9FFDFD82EED14
*/

(Edit: colored the above results.)

Now, the detailed comparison:

decimal: 0.73322
binary:  0.10111011101101000100111001010000110001011110101100110001001... (infinite sequence of 0s and 1s)
internal: exponent (hex): 3FE
          mantissa (bin): 0111011101101000100111001010000110001011110101100110, last bit (place value 2^-53) rounded down 
          mantissa (hex): 77689CA18BD66
converted back to decimal: 0.73321999999999998287... (numeric representation error approx. -1.7E-17)

decimal:    56.73322
binary: 111000.10111011101101000100111001010000110001011110101100110001001...
internal: exponent (hex): 404
          mantissa (bin): 1100010111011101101000100111001010000110001011110110, last bit (place value 2^-47) rounded up
          mantissa (hex): C5DDA272862F6
converted back to decimal: 56.73322000000000286945... (numeric representation error approx. 2.9E-15)

decimal:           35936.73322
binary: 1000110001100000.10111011101101000100111001010000110001011110101100110001001...
internal: exponent (hex): 40E
          mantissa (bin): 0001100011000001011101110110100010011100101000011001, last bit (place value 2^-37) rounded up
          mantissa (hex): 18C177689CA19
converted back to decimal: 35936.73322000000189291313... (numeric representation error approx. 1.9E-12)

decimal:                     1744797536.73322
binary: 1100111111111110111111101100000.10111011101101000100111001010000110001011110101100110001001...
internal: exponent (hex): 41D
          mantissa (bin): 1001111111111101111111011000001011101110110100010100, last bit (place value 2^-22) rounded up
          mantissa (hex): 9FFDFD82EED14
converted back to decimal: 1744797536.73322010040283203125 (numeric representation error approx. 1.0E-7)

The fractional part of the four numbers is always the same, but the number of bits required for the integer part increases as we move from X1 via X2 and X3 to X4. For a numeric variable of (default) length 8 bytes there are always 8*8=64 bits of storage space available. These 64 bits must accommodate both the integer part and the fractional part (as far as possible).

Looking at the four binary mantissas we see: The larger the integer part gets, the more bits from the fractional part we lose (red).

In particular, by adding the date part 86400*20194=1744761600 to the time value X3 (in order to obtain the datetime value X4) we rounded a sequence of 37 binary digits to only 22, which means a substantial loss of information.

The most interesting part is to see what happens if we subtract the largest multiple of 60 less than X4, i.e. 1744797480, from X4, in order to obtain the seconds from the datetime value. Since 1744797480 is almost as big as 1744797536, the integer parts cancel out to a large extent, both in the decimal and in the binary system. In the latter, the length of the integer part decreases from 31 to only 6 digits (see binary representations of X4 and X2 above).

Similarly, the internal binary floating-point representation of the result would actually require 31-6=25 bits less than what is needed for X4. However, the storage location is always 64 bits long. The 25 bits saved cannot be left blank. Ideally, these should be filled with the corresponding binary digits from X2. But this information is not available in the current calculation: even X3 contained only 15 of those 25 bits, and these 15 bits were lost in the transition from X3 to X4 (as described above). As the final resort, SAS fills all 25 "empty" bits with zeros, as shown below.

decimal: 1744797536.73322
        -1744797480
        =        56.73322

binary: 1100111111111110111111101100000.10111011101101000100111001010000110001011110101100110001001...
       -1100111111111110111111100101000
       =                         111000.10111011101101000100111001010000110001011110101100110001001...
internal result:
  exponent (hex): 404
  mantissa (bin): 1100010111011101101000101000000000000000000000000000, last 25 bits padded with zeros
  mantissa (hex): C5DDA28000000
converted back to decimal: 56.73322010040283203125 (This combines the absolute num. repr. error of X4 with
the much smaller number X2, thus pushing the relative error from order of magnitude 10^-17 to 10^-9.)

Even with "long" formats such as 23.20 SAS does not show all 20 decimals of the latter number, but only 13 (and pads the rest with zeros), although the information is available. This rounding is wise, because the place value of the least significant bit in the internal representation is 2^-47=7.105...E-15. This means that hardly one more digit could be justified by the precision of the internal binary representation.

So, finally we see that the "phantom decimals" you observed are not unpredictable random digits, but the result of reproducible calculations with binary floating-point numbers.

Here are two references on the internal representation of numbers in SAS and on strategies how to deal with numeric representation issues:

Datetime creating phantom decimals

Re: Datetime creating phantom decimals

Re: Datetime creating phantom decimals

Re: Datetime creating phantom decimals

Datetime creating phantom decimals

Re: Datetime creating phantom decimals

Re: Datetime creating phantom decimals

Re: Datetime creating phantom decimals

SAS Innovate 2025: Call for Content