BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leehsin
Quartz | Level 8

Hi, I had two data sets and encountered an issue that the value in a data set extracted from Snowflake are slightly different to the original value stored in SAS data set. See the example: 

   DATA_SF      DATA_Ori
COST_AMT     COST_AMT              Diff.            %Diff
          44.48                44.48      7.11E-15      1.60E-14
            5.1                    5.1        8.88E-16      1.74E-14

 

My questions: What could be the reason for this different? Suggestion?

 

Thank you very much!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @leehsin,

 

I would make sure that the values are correct if they are processed later on. Otherwise these small rounding errors (7.11E-15 etc.) can cause unwanted issues in all sorts of comparisons, including IF or WHERE conditions, assignment of user-defined format categories, merging and sorting. Similar errors can arise from calculations, even within SAS.

 

Example:

data data_ori;
input cost_amt;
cards;
44.48
5.1
;

data data_sf;
set data_ori;
cost_amt=cost_amt*449/449;
run;

proc compare data=data_sf c=data_ori;
run;

Result (using Windows SAS 9.4M5):

           ||       Base    Compare
       Obs ||   cost_amt   cost_amt      Diff.     % Diff
 ________  ||  _________  _________  _________  _________
           ||
        1  ||    44.4800    44.4800  7.105E-15  1.597E-14
        2  ||     5.1000     5.1000  8.882E-16  1.742E-14

In both cases the calculation (seemingly a multiplication by 1) has reduced the result by one unit of the least significant bit in the internal binary floating-point representation, namely 2**-47=7.105...E-15 in the case of 44.48 and 2**-50=8.8817...E-16 for 5.1. You can notice the errors in the HEX16. format:

proc print data=data_sf;
format cost_amt hex16.;
run;

Result:

Obs            cost_amt

 1     40463D70A3D70A3C
 2     4014666666666665

Even without seeing the corresponding representations of the original values (in data_ori) it's fairly obvious that the correct last hex digits should be D and 6, respectively (in view of the repeating digit patterns). So, look at a few of your values in HEX16. format to find out where the errors have occurred.

 

Besides calculations, "moving [data] across application and/or architecture borders" is a common way of introducing this type of rounding (or numeric representation) errors, as mentioned by Kurt_Bremser. I would follow ballardw's advice and correct the errors (in SAS) by applying the ROUND function with an appropriate rounding unit (which depends a bit on your data).

 

Example:

data want;
set data_sf;
cost_amt=round(cost_amt,1e-7);
run;

 

View solution in original post

12 REPLIES 12
Reeza
Super User

Numeric precision - basically computers aren't perfect and can't track decimals exactly after a certain point.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/p0dv87zb3bnse6n1mqo360be70qr.htm

 

Is a difference of 0.0000000000000711 of significance to you?

 


@leehsin wrote:

Hi, I had two data sets and encountered an issue that the value in a data set extracted from Snowflake are slightly different to the original value stored in SAS data set. See the example: 

   DATA_SF      DATA_Ori
COST_AMT     COST_AMT              Diff.            %Diff
          44.48                44.48      7.11E-15      1.60E-14
            5.1                    5.1        8.88E-16      1.74E-14

 

My questions: What could be the reason for this different? Suggestion?

 

Thank you very much!


 

 

leehsin
Quartz | Level 8
Thanks, Reeze!

Here any difference is my concern to find a solution, regardless the significance.
Kurt_Bremser
Super User

You will often find such differences when moving across application and/or architecture borders. We found differences between SAS on the mainframe and SAS on UNIX/Windows simply because the MF uses more bits in the mantissa.

So, once you verified how the differences originate and that they are statistically insignificant (which you have done), live with them.

ballardw
Super User

E-15 range of differences likely points to precision of different machines as very likely issue.

 

If you are actually only using two decimal places then round or truncate the data to the required precision you use.

 

 

leehsin
Quartz | Level 8
Thanks Ballardw!

The values start with 44.48, 5.1 in a SAS dataset. The dataset was uploaded to Snowflake. Then a request from a user asks to extract data from the Snowflake for another task, and the values change in the new dataset. Though it is acceptable in this case, will this kind of change accumulates by time and becomes an issue for a small value? I am seeking a way that I can get the unchanged value in the extraction data, if this way technically exists.
leehsin
Quartz | Level 8
Thanks Ballardw !

The values start with 44.48, 5.1 in a SAS dataset. The dataset was uploaded to Snowflake. Then a request from a user asks to extract data from the Snowflake for another task, and the values change in the new dataset. Though it is acceptable in this case, will this kind of change accumulates by time and becomes an issue for a small value? I am seeking a way that I can get the unchanged value in the extraction data, if this way technically exists.
Reeza
Super User
There is no change in the data. The small error is negligible and something you'll find with all computers. If you check the snowflake data against the original data source you'll likely find the same differences. Computers typically store data with a set precision of up to 16 decimal points and you're differences are after that point. What is the precision requirement for your data?
FreelanceReinh
Jade | Level 19

Hi @leehsin,

 

I would make sure that the values are correct if they are processed later on. Otherwise these small rounding errors (7.11E-15 etc.) can cause unwanted issues in all sorts of comparisons, including IF or WHERE conditions, assignment of user-defined format categories, merging and sorting. Similar errors can arise from calculations, even within SAS.

 

Example:

data data_ori;
input cost_amt;
cards;
44.48
5.1
;

data data_sf;
set data_ori;
cost_amt=cost_amt*449/449;
run;

proc compare data=data_sf c=data_ori;
run;

Result (using Windows SAS 9.4M5):

           ||       Base    Compare
       Obs ||   cost_amt   cost_amt      Diff.     % Diff
 ________  ||  _________  _________  _________  _________
           ||
        1  ||    44.4800    44.4800  7.105E-15  1.597E-14
        2  ||     5.1000     5.1000  8.882E-16  1.742E-14

In both cases the calculation (seemingly a multiplication by 1) has reduced the result by one unit of the least significant bit in the internal binary floating-point representation, namely 2**-47=7.105...E-15 in the case of 44.48 and 2**-50=8.8817...E-16 for 5.1. You can notice the errors in the HEX16. format:

proc print data=data_sf;
format cost_amt hex16.;
run;

Result:

Obs            cost_amt

 1     40463D70A3D70A3C
 2     4014666666666665

Even without seeing the corresponding representations of the original values (in data_ori) it's fairly obvious that the correct last hex digits should be D and 6, respectively (in view of the repeating digit patterns). So, look at a few of your values in HEX16. format to find out where the errors have occurred.

 

Besides calculations, "moving [data] across application and/or architecture borders" is a common way of introducing this type of rounding (or numeric representation) errors, as mentioned by Kurt_Bremser. I would follow ballardw's advice and correct the errors (in SAS) by applying the ROUND function with an appropriate rounding unit (which depends a bit on your data).

 

Example:

data want;
set data_sf;
cost_amt=round(cost_amt,1e-7);
run;

 

leehsin
Quartz | Level 8
Thank you, FreelanceReinhard! Great explanation!
Actually I did use round function to the concerned variables in both data when making comparisons, but this is the aftermath solution when the issues appear. If I could prevent this kind of issue from happening at the beginning when I extract a data from Snowflake, I would be happy not going back to repeat the process by applying the solution because this issue is not a real issue actually but may give you a surprise for having a overwhelming long list of data inconsistency, and repeating a process sometime is a very time consuming step in a project having many steps.
Ksharp
Super User

Try options METHOD= and FUZZ= :

 

data data_ori;
input cost_amt;
cards;
44.48
5.1
;

data data_sf;
set data_ori;
cost_amt=cost_amt*449/449;
run;

proc compare data=data_sf c=data_ori method=relative fuzz=1E-10;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3204 views
  • 5 likes
  • 6 in conversation