Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: I meet with a problem of data values in do loops

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-11-2020 11:45 AM
(1398 views)

Hi All, I generate a dataset named tmp1 using the following do loops, based on which I wanted to generate another dataset named tmp2. Although it is obvious that tmp2 would have one record, it did not.

Could anyone tell me what's wrong? Thank you!

__data tmp1;__

__do i=0.025 to 0.250 by 0.025;____output;____end;____run;__

__data tmp2;____set tmp1;____if i eq 0.2;____run;__

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a problem of machine precision, where digital computers cannot represent some numbers exactly, and so the value in the data set is the number you expect within plus or minus some very small value epsilon.

So, you can handle it this way:

```
data tmp2;
set tmp1;
if fuzz(i-0.2)=0;
run;
```

or this way

```
data tmp2;
set tmp1;
i=round(i,0.001);
if i eq 0.2;
run;
```

--

Paige Miller

Paige Miller

14 REPLIES 14

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a problem of machine precision, where digital computers cannot represent some numbers exactly, and so the value in the data set is the number you expect within plus or minus some very small value epsilon.

So, you can handle it this way:

```
data tmp2;
set tmp1;
if fuzz(i-0.2)=0;
run;
```

or this way

```
data tmp2;
set tmp1;
i=round(i,0.001);
if i eq 0.2;
run;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PaigeMiller is right. Looping around floating points is a headache in many other programming languages as well. An easy fix would be to loop around integers, maintaining the same logic:

```
data tmp1;
do i=1 to 10 by 1; /*just multiple all numbers by 40 to keep it nice and simple*/
i_=i/40; /*preserve value of i you originally wanted*/
output;
end;
run;
data tmp2;
set tmp1;
if i eq 0.2*40;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@vellad wrote:

@PaigeMiller is right. Looping around floating points is a headache in many other programming languages as well. An easy fix would be to loop around integers, maintaining the same logic:

`data tmp1; do i=1 to 10 by 1; /*just multiple all numbers by 40 to keep it nice and simple*/ i_=i/40; /*preserve value of i you originally wanted*/ output; end; run; data tmp2; set tmp1; if i eq 0.2*40; run;`

Just to gild the lily: Converting to integer works, but more generally you just need to convert such that both the starting value and the increment are representable as powers of 2 (including non-positive powers representing 1, 1/2, 1/4, 1/8, etc.).

So converting as below would work as well. Any further change to integer would be no more than floating the radix point - i.e. modifying the binary exponent. Of course, it's not necessarily any more intuitive.

```
data tmp1;
do i=0.25 to 2.5 by 0.25;
i_=i/10;
output;
end;
run;
data tmp2;
set tmp1;
if i eq 0.2*10;
run;
```

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Its neat. For Neo, definitely more intuitive 😉

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @vellad,

I like the approach of using an integer index variable and use it regularly in similar situations. However, I would not use a selection criterion like i eq 0.2*40 where a calculation involves an operand (0.2) which is known to contain numeric representation error (although it works in this particular example). I would be much more confident that a calculation involving only integer operands of moderate size (like i_=i/40 in your case) yields a result which can be safely used in comparisons (such as i_ eq 0.2). Of course, the safest option is to avoid decimal fractions altogether (as in i eq 8).

Warning example:

```
data test;
do i=1 to 99;
x=i/100;
a=(x=input(cats('0.',put(i,z2.)),8.)); /* testing e.g. x=0.07 */
b=(i=x*100); /* testing e.g. i=0.07*100 */
if ~a | ~b then output;
end;
run;
proc print data=test;
run;
```

Result (Windows SAS 9.4):

Obs i x a b 1 7 0.07 1 0 2 14 0.14 1 0 3 28 0.28 1 0 4 29 0.29 1 0 5 55 0.55 1 0 6 56 0.56 1 0 7 57 0.57 1 0 8 58 0.58 1 0

That is, e.g., the Boolean expression 7/100=0.07 & 0.07*100~=7 is *true* in SAS under Windows.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

My assertion was:

@FreelanceReinh wrote:

(...) the Boolean expression 7/100=0.07 & 0.07*100~=7 is

truein SAS under Windows.

Your code first computes 7/100 and rounds the result to two decimals, which doesn't change anything (as mentioned in my previous post), so x=0.07, and then the result of x*100 -- a number close to, but not equal to 7 in SAS under Windows -- is stored in variable y. This, in turn, is compared to variable i, which contains the exact value 7. The IF condition is met and the OUTPUT statement produces one observation in dataset test, thus confirming my assertion. The reason for the inequality is a rounding error occurring in the multiplication of 0.07 -- which has no exact internal binary representation (i.e., variable x already contains a rounding error, called *numeric representation error*) -- with 100.

__Edit:__

The decimal fraction 0.07, converted to the binary system (and with some line breaks inserted), looks like this:

0.0001000111101011100001 01000111101011100001 01000111101011100001 ...

The 20-binary-digit pattern "01000111101011100001" is repeated *infinitely many times* ("periodic fraction").

The internal 64-bit floating-point representation that SAS under Windows uses to store 0.07 in an 8-byte numeric variable is based on the above binary representation. However, only 52 of the 64 bits (rather than infinitely many) are available for the binary digits following the first "1". The remaining binary digits are rounded (in this particular example: rounded *up*), as can be seen with the BINARY64. format:

```
data _null_;
x=0.07;
put x binary64.;
run;
```

Result:

0011111110110001111010111000010100011110101110000101000111101100

Thanks to the periodicity, we know that the last four bits, **1100**, resulted from rounding 1011100001... to **1100**000000... Apart from this, the 52 "mantissa bits" (in bold face above) are copied from the periodic binary representation and we recognize the repeating 20-binary-digit pattern (highlighted in blue).

Unlike most decimal fractions, integers (of moderate size) have exact 64-bit floating-point binary representations, ending in many zeros without rounding. Let's have a look at these representations of 100 and 7:

```
data _null_;
x=100; y=7;
put (x y)(=binary64./);
run;
```

Result:

x=0100000001011001000000000000000000000000000000000000000000000000y=0100000000011100000000000000000000000000000000000000000000000000

The 52 mantissa bits reflect the digits after the first "1" of the binary representations of 100 and 7, which are 1100100 and 111, respectively, padded with trailing zeros.

Now, the multiplication 0.07*100 (using the binary floating-point representations) amounts to multiplying the two numbers below:

1.000111101011100001* 2**-4 1.0100011110101110000101000111101100100100* 2**6

It's not difficult to do this multiplication by hand. Up to the factor 2**-4 * 2**6 = 2**2 = 4, the result can be obtained as the sum (of shifted binary numbers)

1.0001111010111000010100011110101110000101000111101100 + 0.10001111010111000010100011110101110000101000111101100+ 0.000100011110101110000101000111101011100001010001111011001.11000000000000000000000000000000000000000000000000001100

The latter number has a mantissa of 56 bits, hence it must be rounded to 52 bits in order to fit into an 8-byte numeric variable. As a consequence, the sequence of the least significant bits, ...01100 is rounded up to ...1(0000). So, for SAS under Windows the result of the multiplication is

1.1100000000000000000000000000000000000000000000000001* 2**2

written in floating-point representation:

0100000000011100000000000000000000000000000000000000000000000001

Indeed, this matches the result obtained with the BINARY64. format:

```
data _null_;
y=0.07*100;
put y binary64.;
run;
```

So, the difference between the result stored in variable y and the exact value 7 (see internal representation further above) is 1 * 2**-50 = 8.881784...E-16, as is confirmed by SAS:

455 data _null_; 456 d=0.07*100-7; 457 put d; 458 run; 8.881784E-16

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I've just extended my previous post to demonstrate how the result of 0.07*100 (deviating from 7) comes about. If you perform similar DATA steps and hand calculations for 0.7*10 and 0.007*1000, you'll see why there are no issues with these two cases. (I've checked only the first case, which is relatively easy.) Note that the similarities between the three pairs of *decimal* factors considered here does *not *translate to similarities in the *binary* system, i.e., the patterns of 0s and 1s are very different. For example, the lengths of the periods in the periodic binary representations of 0.7 and 0.007 are 4 and 100, respectively, not 20 as in the case of 0.07.

Using the TRUNC function with a second argument 7 (or 6, 5, ...) you truncate the 8 (or 16, 24, ...) least significant bits of the internal representation. In the example of 0.07*100 these bits, of course, include the rounding error 1 * 2**-50 (the "red 1" in the 52nd mantissa bit), so you get rid of it. If the results do not contain rounding errors in the internal representation (which happens to be true for 0.7*10 and 0.007*1000), the TRUNC function doesn't change that. This explains your observations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your step-by-step illustration. It's really helpful. I had never thought it in this way.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Wow, thanks so much for the detailed response! I got it now and it is indeed not that counter intuitive how 0.07 is different from 0.7 and 0.007, once you go down to the bit level.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Really helpful solutions! Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I agree and will try to use intergers in do loops in my future works. Thank you!

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.