DATA Step, Macro, Functions and more

Weird output when combining two data sets!

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

Weird output when combining two data sets!

Hi all,

 

I'm testing a simple program by merging two datasets in SAS Studio. Here is my code:

 

 

data one(drop=i);
do i=1 to 100;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;

If you run this code, you get the expected output, which is basically the CARDS/DATALINES section of data two. However, if I change the increment of the DO loop in data one from 1 to 0.1, i.e.

 

data one(drop=i);
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;

then I get an unexpected/unexplainable output which is only "0 3". Could anyone tell me why I get this strange output and how to fix this issue? Thanks a lot!


Accepted Solutions
Solution
‎03-10-2016 03:34 PM
Trusted Advisor
Posts: 1,115

Re: Weird output when combining two data sets!

Hi @expertyejin,

 

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

 

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:
https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

 

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @KurtBremser has posted a link to a similar explanation on an external website.)

 

As @KurtBremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

 

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are not true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

 

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

 

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

data one(drop=i);
do i=10 to 1000;
  x=log10(i/10);
  output;
end;
run;

 

View solution in original post


All Replies
Contributor
Posts: 62

Re: Weird output when combining two data sets!

This looks like a problem with the precision of the BY variables. LOG10(10) doesn't exactly equal 1.

 

You might like to try the ROUND function on the log transformed variable.

 

Norman.

Norman.
SAS 9.4 (TS1M0) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

Super User
Posts: 6,936

Re: Weird output when combining two data sets!

To verify your precision problem, just add

format x best32.;

to your first data step and take a look at observation 91 of dataset one.

As soon as you have fractions with numbers represented in real format, you get small errors when doing calculations, unless you use fractions that represent easily (and finitely) in binary.

IOW, adding 0.1 10 times is different from adding 1, as the binary representation of 0.1 (base 10) looks kind of funny, while 1 (base 10) is simply 1 (base 2).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,936

Re: Weird output when combining two data sets!

Even more revealing is this:

data one;
format x i best32.;
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;

Just look at the seemingly crazy development of i, especially after obs 747

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: Weird output when combining two data sets!

This is very interesting - and worrying!

 

Why doesn't SAS throw up an ERROR (or at least a NOTE) when i is not an integer?

 

Norman.

Norman.
SAS 9.4 (TS1M0) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

Super User
Posts: 6,936

Re: Weird output when combining two data sets!

Because SAS does not know "integer". A number in SAS is always stored in 8 byte real, and even if you enter a number that is considered an integer, it will be stored as mantissa an exponent. As soon as you need more precision than the mantissa can hold, you get rounding errors.

Decimal fractions need so many digits in the (binary) mantissa that you very quickly get those rounding errors.

And why should SAS expect integers when you already introduced fractions by adding

by 0.1;

?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: Weird output when combining two data sets!

OK, but this doesn't explain why the error described in the OP only occurs in the first example.

 

Norman.

Norman.
SAS 9.4 (TS1M0) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

Super User
Posts: 6,936

Re: Weird output when combining two data sets!


Norman21 wrote:

OK, but this doesn't explain why the error described in the OP only occurs in the first example.

 

Norman.


Look at this:

why-0-point-1-does-not-exist-in-floating-point/

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,936

Re: Weird output when combining two data sets!

I have another beauty for you.

Why does this work as expected?

data one;
format x i best32.;
do i=1 to 100 by 0.25;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: Weird output when combining two data sets!

!?!

 

Yet another reason for SAS to provide a NOTE (at least) if the increment is not an "integer".

 

Norman.

Norman.
SAS 9.4 (TS1M0) X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

Super User
Posts: 6,936

Re: Weird output when combining two data sets!

SAS assumes that you know what you are doing when you specifiy a non-integer increment. The "by 0.1" didn't appear there by accident.

 

I am very happy that SAS does not treat me like somebody who left her/his brain at home. Smiley Wink

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎03-10-2016 03:34 PM
Trusted Advisor
Posts: 1,115

Re: Weird output when combining two data sets!

Hi @expertyejin,

 

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

 

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:
https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

 

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @KurtBremser has posted a link to a similar explanation on an external website.)

 

As @KurtBremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

 

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are not true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

 

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

 

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

data one(drop=i);
do i=10 to 1000;
  x=log10(i/10);
  output;
end;
run;

 

Contributor
Posts: 26

Re: Weird output when combining two data sets!

Thank you all for your great help and advice!! And I've considered myself an advanced learner when I'm even stumbled on such a basic concept (sigh...) 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 446 views
  • 7 likes
  • 4 in conversation