BookmarkSubscribeRSS Feed
ammirabile
Calcite | Level 5

Hi All,

 

I'm trying to create a process that exports data to a txt file with a trail record with some information which includes the check sum of a particular variable.

The process works fine, but we are having some issues with decimal places.

Sample code below:

RETAIN CHKSUM;
      CHKSUM = SUM(ROUND(CHKSUM,0.01),ROUND(&CHKSUM_VAR,0.01));
      IF LAST THEN DO;
         NOBS = _N_;
         PUT CHKSUM; 
      END;

The txt file is later on imported in Azure using spark and they have some controls where they sum one particular variable of the file and compare with the check sum value.

This process works for some files but not for others, the import fails because the check sum is not matching with the sum of the variable in the file and the only explanation is the decimal precision in SAS.

We tried several different ways of doing it and they all end up with a slightly different number in the check sum and issues with the import in Azure.

I'm almost convinced that this can't be fixed given that SAS stores data in binary format which can cause some precision issues like this.

 

Would be good to have different opinions on this as I think I've exhausted my ideas on how to fix this.

 

PS: Sorry if I explained the problem badly.

 

 

4 REPLIES 4
Astounding
PROC Star
Try using integers:

Chksum + round(100 * &chksum_var);

For comparison purposes it would be best to leave the values in this form. But if you have to, divide the final result by 100.
Patrick
Opal | Level 21

Never used it myself yet but if you're on SAS9.4M6 or later then some of the new hashing functions could eventually do the job for you. The receiving system would need to have the same algorithm of course.

 

Approaches using floating point processing under different environment are always deemed to some challenges with precision. I would have thought though that the rounding to two decimals would avoid such issues.

 

...and you do transfer the file in binary mode - right?

ballardw
Super User

You may also be running into a timing of calculation issue.

 

Perhaps you may test the difference in results of using some variant of

RETAIN CHKSUM chksum2;
      CHKSUM = SUM(ROUND(CHKSUM,0.01),ROUND(&CHKSUM_VAR,0.01));
   chksum2 = round(sum(chksum2,&chksum_var),0.01);
     IF LAST THEN DO;
         NOBS = _N_;
         PUT CHKSUM; 
      END;

Also, test one of your sets that fails examining the results of the chcksum for each observation. That may show places where the the rounding is contributing small round up or round down errors that the checking program detects.

Or show the entire data step. Your check sum is using a rounded version of the other variable. Is the output for &chksum_var left in the unrounded values for the exported data? Then I would expect a difference.

s_lassen
Meteorite | Level 14

What is the format of the CHKSUM variable on Azure, and what kind of numbers do you get?

 

The precision of numbers is SAS is about 15 decimals, if some of your output variables have more than that, you are probably going to have rounding errors whatever you do (in standard SAS, that is).

 

If you have less than 15 decimals in your checksum, the idea of converting to integers may help, because SAS numbers are normally represented as 64-bit base-2 floating points (6 bytes for the mantissa and 2 bytes for the exponent), except on the mainframe, where is is 64-bit base 16 (7 bytes for the mantissa, but the exponent only 1 byte and is base 16). In any case there may be the occasional rounding error when converting numbers with 2 decimals from base 2 (or base 16) to base 10.

 

If you have more than 15 significant decimals in your checksum, I see two possibilities.

 

Either load the basic data to Azure, and let them do the summing, e.g.:

Libname Azure <connection string to library on Azure>;
proc sql:
  connect to sqlsrv <connection string to library on Azure> as Azure;
  execute by Azure (create table tempload (column definitions, without the cheksum column));
quit;

proc append base=Azure.tempload data=have;
run;

proc sql;
  connect to sqlsrv <connection string to library on Azure> as Azure;
  execute by Azure(
      insert into <output table(column names)> 
      (select *,<summary expression over group variables> as checksum 
       from tempload)
    );
quit;

proc delete data=Azure.tempload;
run;
  

 

The other is to use PROC DS2 (where you can declare a variable with more precision than you can in normal SAS) for the checksum calculation, and then load from there. I do not have enough experience with DS2 to suggest a code example, but I think it may work.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 586 views
  • 1 like
  • 5 in conversation