07-21-2017 04:43 PM
I've noticed that the sort can slightly alter results of a proc means, even if the data is exactly the same. Has anyone else experienced this? I've bypassed the issue by just rounding everything prior to formatting my results from a proc means output, but I'm very curious to know why SAS's internal processors can give different results based on rounding. Test data below to explain what I mean. (Using SAS 9.4)
* Data set sorted by month, year, pressure; data test1; set sashelp.enso; proc sort; by month year pressure; run; * Same data set sorted by pressure, month, year; data test2; set sashelp.enso; proc sort; by pressure month year; run; * Proc Means; proc means data = test1 noprint; var year; output n = XN1 mean = XMEAN1 std = XSTD1 out = _result1; run; * Proc Means; proc means data = test2 noprint; var year; output n = XN1 mean = XMEAN1 std = XSTD1 out = _result2; run; ** compare **; proc compare base=_result1 compare=_result2; run; ** standardized output **; data _check1;length display $15; set _result1; display=strip(put(XMEAN1,10.2)); run; data _check2;length display $15; set _result2; display=strip(put(XMEAN1,10.2)); run; ** compare **; proc compare base=_check1 compare=_check2; run;
07-21-2017 05:02 PM
This is another example of numeric precision and storage of decimals in a binary system. The order of addition to the accumulated sum would affect which bits more than the precision of the storage allows gets lost. Note that most of the values of Year in that data set apparently represent some form of repeating decimal such as 1 and1/3 stored as 1.333333333333 (which has already lost some information if that is the case as the truncation at a specific point is not the same as 1/3.
Which is one reason compare has the FUZZ option for considering "how close to you want to consider results equal"
07-21-2017 06:40 PM
The differences are not meaningful. They're 0.0000000000014 or along those lines.
If differences of this size are meaningful you're going to have issues with any software because numerical precision is an issue with anything that uses binary representation of numbers, ie computers today. Until quantam computing is more real.
That's not to say that you can't work around these obstacles if needed. There's an entire section in the SAS documentation dedicated to numerical precision.