- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I'm trying to create a process that exports data to a txt file with a trail record with some information which includes the check sum of a particular variable.
The process works fine, but we are having some issues with decimal places.
Sample code below:
RETAIN CHKSUM; CHKSUM = SUM(ROUND(CHKSUM,0.01),ROUND(&CHKSUM_VAR,0.01)); IF LAST THEN DO; NOBS = _N_; PUT CHKSUM; END;
The txt file is later on imported in Azure using spark and they have some controls where they sum one particular variable of the file and compare with the check sum value.
This process works for some files but not for others, the import fails because the check sum is not matching with the sum of the variable in the file and the only explanation is the decimal precision in SAS.
We tried several different ways of doing it and they all end up with a slightly different number in the check sum and issues with the import in Azure.
I'm almost convinced that this can't be fixed given that SAS stores data in binary format which can cause some precision issues like this.
Would be good to have different opinions on this as I think I've exhausted my ideas on how to fix this.
PS: Sorry if I explained the problem badly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Chksum + round(100 * &chksum_var);
For comparison purposes it would be best to leave the values in this form. But if you have to, divide the final result by 100.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Never used it myself yet but if you're on SAS9.4M6 or later then some of the new hashing functions could eventually do the job for you. The receiving system would need to have the same algorithm of course.
Approaches using floating point processing under different environment are always deemed to some challenges with precision. I would have thought though that the rounding to two decimals would avoid such issues.
...and you do transfer the file in binary mode - right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may also be running into a timing of calculation issue.
Perhaps you may test the difference in results of using some variant of
RETAIN CHKSUM chksum2; CHKSUM = SUM(ROUND(CHKSUM,0.01),ROUND(&CHKSUM_VAR,0.01)); chksum2 = round(sum(chksum2,&chksum_var),0.01); IF LAST THEN DO; NOBS = _N_; PUT CHKSUM; END;
Also, test one of your sets that fails examining the results of the chcksum for each observation. That may show places where the the rounding is contributing small round up or round down errors that the checking program detects.
Or show the entire data step. Your check sum is using a rounded version of the other variable. Is the output for &chksum_var left in the unrounded values for the exported data? Then I would expect a difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the format of the CHKSUM variable on Azure, and what kind of numbers do you get?
The precision of numbers is SAS is about 15 decimals, if some of your output variables have more than that, you are probably going to have rounding errors whatever you do (in standard SAS, that is).
If you have less than 15 decimals in your checksum, the idea of converting to integers may help, because SAS numbers are normally represented as 64-bit base-2 floating points (6 bytes for the mantissa and 2 bytes for the exponent), except on the mainframe, where is is 64-bit base 16 (7 bytes for the mantissa, but the exponent only 1 byte and is base 16). In any case there may be the occasional rounding error when converting numbers with 2 decimals from base 2 (or base 16) to base 10.
If you have more than 15 significant decimals in your checksum, I see two possibilities.
Either load the basic data to Azure, and let them do the summing, e.g.:
Libname Azure <connection string to library on Azure>;
proc sql:
connect to sqlsrv <connection string to library on Azure> as Azure;
execute by Azure (create table tempload (column definitions, without the cheksum column));
quit;
proc append base=Azure.tempload data=have;
run;
proc sql;
connect to sqlsrv <connection string to library on Azure> as Azure;
execute by Azure(
insert into <output table(column names)>
(select *,<summary expression over group variables> as checksum
from tempload)
);
quit;
proc delete data=Azure.tempload;
run;
The other is to use PROC DS2 (where you can declare a variable with more precision than you can in normal SAS) for the checksum calculation, and then load from there. I do not have enough experience with DS2 to suggest a code example, but I think it may work.