BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
expertyejin
Obsidian | Level 7

Hi all,

 

I'm testing a simple program by merging two datasets in SAS Studio. Here is my code:

 

 

data one(drop=i);
do i=1 to 100;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;

If you run this code, you get the expected output, which is basically the CARDS/DATALINES section of data two. However, if I change the increment of the DO loop in data one from 1 to 0.1, i.e.

 

data one(drop=i);
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;

then I get an unexpected/unexplainable output which is only "0 3". Could anyone tell me why I get this strange output and how to fix this issue? Thanks a lot!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @expertyejin,

 

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

 

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:
https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

 

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @Kurt_Bremser has posted a link to a similar explanation on an external website.)

 

As @Kurt_Bremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

 

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are not true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

 

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

 

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

data one(drop=i);
do i=10 to 1000;
  x=log10(i/10);
  output;
end;
run;

 

View solution in original post

12 REPLIES 12
Norman21
Lapis Lazuli | Level 10

This looks like a problem with the precision of the BY variables. LOG10(10) doesn't exactly equal 1.

 

You might like to try the ROUND function on the log transformed variable.

 

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Kurt_Bremser
Super User

To verify your precision problem, just add

format x best32.;

to your first data step and take a look at observation 91 of dataset one.

As soon as you have fractions with numbers represented in real format, you get small errors when doing calculations, unless you use fractions that represent easily (and finitely) in binary.

IOW, adding 0.1 10 times is different from adding 1, as the binary representation of 0.1 (base 10) looks kind of funny, while 1 (base 10) is simply 1 (base 2).

Kurt_Bremser
Super User

Even more revealing is this:

data one;
format x i best32.;
do i=1 to 100 by 0.1;
x=log10(i);
output;
end;
run;

Just look at the seemingly crazy development of i, especially after obs 747

Norman21
Lapis Lazuli | Level 10

This is very interesting - and worrying!

 

Why doesn't SAS throw up an ERROR (or at least a NOTE) when i is not an integer?

 

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Kurt_Bremser
Super User

Because SAS does not know "integer". A number in SAS is always stored in 8 byte real, and even if you enter a number that is considered an integer, it will be stored as mantissa an exponent. As soon as you need more precision than the mantissa can hold, you get rounding errors.

Decimal fractions need so many digits in the (binary) mantissa that you very quickly get those rounding errors.

And why should SAS expect integers when you already introduced fractions by adding

by 0.1;

?

Norman21
Lapis Lazuli | Level 10

OK, but this doesn't explain why the error described in the OP only occurs in the first example.

 

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Kurt_Bremser
Super User

I have another beauty for you.

Why does this work as expected?

data one;
format x i best32.;
do i=1 to 100 by 0.25;
x=log10(i);
output;
end;
run;

data two;
input x y;
cards;
0 3
1 4
2 5
;
run;

data together;
merge one(in=o) two(in=t);
by x;
if o and t;
run;

proc print;
run;
Norman21
Lapis Lazuli | Level 10

!?!

 

Yet another reason for SAS to provide a NOTE (at least) if the increment is not an "integer".

 

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Kurt_Bremser
Super User

SAS assumes that you know what you are doing when you specifiy a non-integer increment. The "by 0.1" didn't appear there by accident.

 

I am very happy that SAS does not treat me like somebody who left her/his brain at home. 😉

FreelanceReinh
Jade | Level 19

Hi @expertyejin,

 

The primary issue with your code is the DO loop with step size 0.1. This is one of the "classic" examples of producing numeric representation issues and it is described in Numerical Accuracy in SAS Software.

 

I've written several detailed posts on this topic during the past couple of months. Here is a link to the search results for "numeric representation", restricted to my posts:
https://communities.sas.com/t5/forums/searchpage/tab/message?filter=labels%2CauthorId&q=%22numeric+r...

 

Among these, this posting describes in particular what's "wrong" with 0.1. (I've just seen that @Kurt_Bremser has posted a link to a similar explanation on an external website.)

 

As @Kurt_Bremser mentioned, not only 0.1, but many decimal fractions introduce numeric representation errors into calculations. Please see the bulleted list in this post for a quantification of this statement.

 

Moreover, for similar reasons it is risky to merge datasets using BY variables whose values are not necessarily integers (such as logarithms). In your example you rely (successfully) on mathematical facts such as log10(100)=2. However, there are many mathematical identities (e.g. 0.1+0.7=0.8 or 1.00E-5=1.000E-5) which are not true in SAS (platform dependent, though) due to numeric representation issues. Further surprising examples can be found in this post.

 

In many cases the ROUND function can be used to cope with these issues (e.g. round(0.1+0.7, 1e-9)=0.8).

 

A DO loop should use an integer increment. The decimal fractions can be calculated in the body of the loop:

data one(drop=i);
do i=10 to 1000;
  x=log10(i/10);
  output;
end;
run;

 

expertyejin
Obsidian | Level 7

Thank you all for your great help and advice!! And I've considered myself an advanced learner when I'm even stumbled on such a basic concept (sigh...) 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2299 views
  • 7 likes
  • 4 in conversation