turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Weird output when combining two data sets!

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-08-2016 11:00 PM

Hi all,

I'm testing a simple program by merging two datasets in SAS Studio. Here is my code:

```
data one(drop=i);
do i=1 to 100;
x=log10(i);
output;
end;
run;
data two;
input x y;
cards;
0 3
1 4
2 5
;
run;
data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;
proc print;
run;
```

If you run this code, you get the expected output, which is basically the CARDS/DATALINES section of data two. However, if I change the increment of the DO loop in data one from 1 to 0.1, i.e.

```
data one(drop=i);
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;
data two;
input x y;
cards;
0 3
1 4
2 5
;
run;
data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;
proc print;
run;
```

then I get an unexpected/unexplainable output which is only "0 3". Could anyone tell me why I get this strange output and how to fix this issue? Thanks a lot!

Accepted Solutions

Solution

03-10-2016
03:34 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to expertyejin

03-09-2016 07:19 AM

Hi @expertyejin,

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:

https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @KurtBremser has posted a link to a similar explanation on an external website.)

As @KurtBremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are *not* true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

```
data one(drop=i);
do i=10 to 1000;
x=log10(i/10);
output;
end;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to expertyejin

03-09-2016 01:02 AM

This looks like a problem with the precision of the BY variables. LOG10(10) doesn't *exactly* equal 1.

You might like to try the ROUND function on the log transformed variable.

Norman.

Norman.

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to expertyejin

03-09-2016 04:11 AM

To verify your precision problem, just add

`format x best32.;`

to your first data step and take a look at observation 91 of dataset one.

As soon as you have fractions with numbers represented in real format, you get small errors when doing calculations, unless you use fractions that represent easily (and finitely) in binary.

IOW, adding 0.1 10 times is different from adding 1, as the binary representation of 0.1 (base 10) looks kind of funny, while 1 (base 10) is simply 1 (base 2).

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-09-2016 04:19 AM

Even more revealing is this:

```
data one;
format x i best32.;
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;
```

Just look at the seemingly crazy development of i, especially after obs 747

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-09-2016 04:34 AM

This is very interesting - and worrying!

Why doesn't SAS throw up an ERROR (or at least a NOTE) when i is not an integer?

Norman.

Norman.

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Norman21

03-09-2016 04:47 AM

Because SAS does not know "integer". A number in SAS is always stored in 8 byte real, and even if you enter a number that is considered an integer, it will be stored as mantissa an exponent. As soon as you need more precision than the mantissa can hold, you get rounding errors.

Decimal fractions need so many digits in the (binary) mantissa that you very quickly get those rounding errors.

And why should SAS expect integers when you already introduced fractions by adding

by 0.1;

?

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-09-2016 04:51 AM

OK, but this doesn't explain why the error described in the OP only occurs in the first example.

Norman.

Norman.

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Norman21

03-09-2016 05:04 AM

Norman21 wrote:

OK, but this doesn't explain why the error described in the OP only occurs in the first example.

Norman.

Look at this:

why-0-point-1-does-not-exist-in-floating-point/

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Norman21

03-09-2016 04:48 AM

I have another beauty for you.

Why does this work as expected?

```
data one;
format x i best32.;
do i=1 to 100 by 0.25;
x=log10(i);
output;
end;
run;
data two;
input x y;
cards;
0 3
1 4
2 5
;
run;
data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;
proc print;
run;
```

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-09-2016 04:53 AM

!?!

Yet another reason for SAS to provide a NOTE (at least) if the increment is not an "integer".

Norman.

Norman.

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

SAS 9.4 (TS1M4) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Norman21

03-09-2016 05:01 AM

SAS assumes that you know what you are doing when you specifiy a non-integer increment. The "by 0.1" didn't appear there by accident.

I am very happy that SAS does not treat me like somebody who left her/his brain at home.

Maxims of Maximally Efficient SAS Programmers

Solution

03-10-2016
03:34 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to expertyejin

03-09-2016 07:19 AM

Hi @expertyejin,

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:

https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @KurtBremser has posted a link to a similar explanation on an external website.)

As @KurtBremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are *not* true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

```
data one(drop=i);
do i=10 to 1000;
x=log10(i/10);
output;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to expertyejin

03-10-2016 03:44 PM

Thank you all for your great help and advice!! And I've considered myself an advanced learner when I'm even stumbled on such a basic concept (sigh...)