turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- Datetime creating phantom decimals

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-02-2016 04:54 PM

Hi,

In base SAS 9.3, when combining date and time variables using dhms, the result creates phantom decimal places at the tail end of the datetime value. For example:

Time = 9:58:56.733220000 (format time18.9)

Date = 16APR2015

Datetime = dhms(date,0,0,time)

outputted datetime: 16APR2015:09:58:56.733220100 (format datetime28.9)

Interestingly, if I format Datetime variable as a number (e.g. 28.9), I don’t get the rounding issue: 1744797536.733220000. So I think it must be something to do with how SAS converts numbers to datetime.

Any suggestions are welcome. Thanks in advance!

Matt

Accepted Solutions

Solution

03-02-2016
08:25 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to minfante

03-02-2016 07:40 PM

Hi Matt,

You have encountered a numeric representation issue. This is not specific to datetime values, but a (well understood) phenomenon with numeric values which are stored as binary floating-point representations (but displayed in the decimal system).

The conversion from SAS date and time to datetime values is straightforward: datetime=date*86400+time. Therefore, it is not surprising that performing this calculation with the numbers representing date and time in your example ('16APR2015'd=20194, '9:58:56.73322't=35936.73322) yields exactly the same result as obtained using the DHMS function (all calculations were done with SAS 9.4 on Windows):

```
data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
dt=20194*86400+35936.73322;
put datetime hex16.;
put dt hex16.;
run;
/* Result:
41D9FFDFD82EED14
41D9FFDFD82EED14
*/
```

I've used the HEX16. format to display the values, because the usual numeric formats (such as the *w*.*d* format, e.g. 28.9) do some rounding (!), so in general they don't show the number exactly that is stored internally.

If you calculate the *second* of the datetime value, you will get *the same* "phantom decimals" as are displayed in format datetime28.9. (Below I use datetime32.13 to show even more decimals.) For demonstration, I've calculated that value also from the time value and entered it directly.

```
data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
sec=second(datetime);
s=1744797536.73322-1744797480;
st=second(time);
stn=35936.73322-35880;
sx=56.73322;
put datetime datetime32.13;
put (sec s st stn sx) (16.13 /);
put (sec s st stn sx) (hex16. /);
run;
/* Result:
16APR2015:09:58:56.7332201004028
56.7332201004028
56.7332201004028
56.7332200000019
56.7332200000019
56.7332200000000
404C5DDA28000000
404C5DDA28000000
404C5DDA27286400
404C5DDA27286400
404C5DDA272862F6
*/
```

As you can see, the results from purely numeric calculation are identical to those obtained by using the SECOND function. However, there are three (slightly) different values for the seconds, depending on whether they were derived from the datetime value, derived from the time value or entered directly.

Reason: The orders of magnitude of 1744797536.73322, 35936.73322 and 56.73322 are very different. Therefore, the precision of these values varies considerably (i.e., the value of the least significant bit in the internal 64-bit floating-point representation differs by several powers of 2 between the three numbers). The more bits are required to store the integer part of the number, the fewer bits are available to store the fractional part. And the fractional part is the difficult one! The (mathematically) exact binary representation of 0.73322 has *infinitely* many (binary) digits. Hence, there is *always* a discrepancy between the internal value and the exact decimal value. For the large datetime value this numeric representation error is greater than for the time value, for which in turn it is greater than for the comparably small seconds value (56.73322).

I'll be happy to add more details tomorrow (CET).

All Replies

Solution

03-02-2016
08:25 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to minfante

03-02-2016 07:40 PM

Hi Matt,

You have encountered a numeric representation issue. This is not specific to datetime values, but a (well understood) phenomenon with numeric values which are stored as binary floating-point representations (but displayed in the decimal system).

The conversion from SAS date and time to datetime values is straightforward: datetime=date*86400+time. Therefore, it is not surprising that performing this calculation with the numbers representing date and time in your example ('16APR2015'd=20194, '9:58:56.73322't=35936.73322) yields exactly the same result as obtained using the DHMS function (all calculations were done with SAS 9.4 on Windows):

```
data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
dt=20194*86400+35936.73322;
put datetime hex16.;
put dt hex16.;
run;
/* Result:
41D9FFDFD82EED14
41D9FFDFD82EED14
*/
```

I've used the HEX16. format to display the values, because the usual numeric formats (such as the *w*.*d* format, e.g. 28.9) do some rounding (!), so in general they don't show the number exactly that is stored internally.

If you calculate the *second* of the datetime value, you will get *the same* "phantom decimals" as are displayed in format datetime28.9. (Below I use datetime32.13 to show even more decimals.) For demonstration, I've calculated that value also from the time value and entered it directly.

```
data _null_;
time='9:58:56.73322't;
date='16APR2015'd;
datetime = dhms(date,0,0,time);
sec=second(datetime);
s=1744797536.73322-1744797480;
st=second(time);
stn=35936.73322-35880;
sx=56.73322;
put datetime datetime32.13;
put (sec s st stn sx) (16.13 /);
put (sec s st stn sx) (hex16. /);
run;
/* Result:
16APR2015:09:58:56.7332201004028
56.7332201004028
56.7332201004028
56.7332200000019
56.7332200000019
56.7332200000000
404C5DDA28000000
404C5DDA28000000
404C5DDA27286400
404C5DDA27286400
404C5DDA272862F6
*/
```

As you can see, the results from purely numeric calculation are identical to those obtained by using the SECOND function. However, there are three (slightly) different values for the seconds, depending on whether they were derived from the datetime value, derived from the time value or entered directly.

Reason: The orders of magnitude of 1744797536.73322, 35936.73322 and 56.73322 are very different. Therefore, the precision of these values varies considerably (i.e., the value of the least significant bit in the internal 64-bit floating-point representation differs by several powers of 2 between the three numbers). The more bits are required to store the integer part of the number, the fewer bits are available to store the fractional part. And the fractional part is the difficult one! The (mathematically) exact binary representation of 0.73322 has *infinitely* many (binary) digits. Hence, there is *always* a discrepancy between the internal value and the exact decimal value. For the large datetime value this numeric representation error is greater than for the time value, for which in turn it is greater than for the comparably small seconds value (56.73322).

I'll be happy to add more details tomorrow (CET).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to minfante

03-03-2016 01:53 PM - edited 03-03-2016 01:58 PM

Hi Matt,

As promised, please find below more details on your numeric representation issue:

Let's compare four numbers with decimal places in the binary system and their 64-bit binary floating-point representations, as which SAS stores them internally on Windows and Unix systems, and check what these numbers look like when converted back to the decimal system.

First, a quick look at the internal representations (again, revealed by the HEX16. format):

data _null_; X1 = 0.73322; /* the fractional part of your datetime value */ X2 = 56.73322; /* the seconds of your datetime value */ X3 = 35936.73322; /* the time part of your datetime value */ X4 = 1744797536.73322; /* your datetime value */ put (X:) (= hex16. /); run; /* Result: X1=3FE77689CA18BD66 X2=404C5DDA272862F6 X3=40E18C177689CA19 X4=41D9FFDFD82EED14 */

(Edit: colored the above results.)

Now, the detailed comparison:

decimal: 0.73322 binary: 0.10111011101101000100111001010000110001011110101100110001001... (infinite sequence of 0s and 1s) internal: exponent (hex): 3FE mantissa (bin): 0111011101101000100111001010000110001011110101100110, last bit (place value 2^-53) rounded down mantissa (hex): 77689CA18BD66 converted back to decimal: 0.73321999999999998287... (numeric representation error approx. -1.7E-17) decimal: 56.73322 binary: 111000.10111011101101000100111001010000110001011110101100110001001... internal: exponent (hex): 404 mantissa (bin): 1100010111011101101000100111001010000110001011110110, last bit (place value 2^-47) rounded up mantissa (hex): C5DDA272862F6 converted back to decimal: 56.73322000000000286945... (numeric representation error approx. 2.9E-15) decimal: 35936.73322 binary: 1000110001100000.10111011101101000100111001010000110001011110101100110001001... internal: exponent (hex): 40E mantissa (bin): 0001100011000001011101110110100010011100101000011001, last bit (place value 2^-37) rounded up mantissa (hex): 18C177689CA19 converted back to decimal: 35936.73322000000189291313... (numeric representation error approx. 1.9E-12) decimal: 1744797536.73322 binary: 1100111111111110111111101100000.10111011101101000100111001010000110001011110101100110001001... internal: exponent (hex): 41D mantissa (bin): 1001111111111101111111011000001011101110110100010100, last bit (place value 2^-22) rounded up mantissa (hex): 9FFDFD82EED14 converted back to decimal: 1744797536.73322010040283203125 (numeric representation error approx. 1.0E-7)

The fractional part of the four numbers is always the same, but the number of bits required for the integer part increases as we move from X1 via X2 and X3 to X4. For a numeric variable of (default) length 8 bytes there are always 8*8=64 bits of storage space available. These 64 bits must accommodate both the integer part and the fractional part (as far as possible).

Looking at the four binary mantissas we see: The larger the integer part gets, the more bits from the fractional part we lose (red).

In particular, by adding the date part 86400*20194=1744761600 to the time value X3 (in order to obtain the datetime value X4) we rounded a sequence of 37 binary digits to only 22, which means a substantial loss of information.

The most interesting part is to see what happens if we subtract the largest multiple of 60 less than X4, i.e. 1744797480, from X4, in order to obtain the seconds from the datetime value. Since 1744797480 is almost as big as 1744797536, the integer parts cancel out to a large extent, both in the decimal and in the binary system. In the latter, the length of the integer part decreases from 31 to only 6 digits (see binary representations of X4 and X2 above).

Similarly, the internal binary floating-point representation of the result would actually require 31-6=25 bits less than what is needed for X4. However, the storage location is always 64 bits long. The 25 bits saved cannot be left blank. Ideally, these should be filled with the corresponding binary digits from X2. But this information is not available in the current calculation: even X3 contained only 15 of those 25 bits, and these 15 bits were lost in the transition from X3 to X4 (as described above). As the final resort, SAS fills all 25 "empty" bits with zeros, as shown below.

decimal: 1744797536.73322 -1744797480 = 56.73322 binary: 1100111111111110111111101100000.10111011101101000100111001010000110001011110101100110001001... -1100111111111110111111100101000 = 111000.10111011101101000100111001010000110001011110101100110001001... internal result: exponent (hex): 404 mantissa (bin): 1100010111011101101000101000000000000000000000000000, last 25 bits padded with zeros mantissa (hex): C5DDA28000000 converted back to decimal: 56.73322010040283203125 (This combines theabsolutenum. repr. error of X4 with the much smaller number X2, thus pushing therelativeerror from order of magnitude 10^-17 to 10^-9.)

Even with "long" formats such as 23.20 SAS does not show all 20 decimals of the latter number, but only 13 (and pads the rest with zeros), although the information is available. This rounding is wise, because the place value of the least significant bit in the internal representation is 2^-47=7.105...E-15. This means that hardly one more digit could be justified by the precision of the internal binary representation.

**So, finally we see that the "phantom decimals" you observed are not unpredictable random digits, but the result of reproducible calculations with binary floating-point numbers.**

Here are two references on the internal representation of numbers in SAS and on strategies how to deal with numeric representation issues: