Hi, all,
I'm using 9.3 and can't understand why I'm getting the following results for this code for a PROC FORMAT using the inequality operator pair, -< (up to but not including...) :
PROC FORMAT;
VALUE CRLIMIT
. = 'Test not performed'
Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -<6.0 = 'Elevated [3.5- not including 6.0)'
6.0 -<20.0 = 'Unacceptable [6.0- not including 20.0)'
20.0- HIGH = 'Possible entry error [20 and above]';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= i;
OUTPUT;
end;
RUN;
PROC PRINT DATA = ONE;
VAR i j;
FORMAT j CRLIMIT.;
RUN;
And the output (editted) shows the following:
OBS i j
. . .
34 3.3 Normal/acceptable (Up to/ not including 3.5)
35 3.4 Normal/acceptable (Up to/ not including 3.5)
36 3.5 Elevated [3.5- not including 6.0)
37 3.6 Elevated [3.5- not including 6.0)
. . .
60 5.9 Elevated [3.5- not including 6.0)
61 6.0 Elevated [3.5- not including 6.0)
62 6.1 Unacceptable [6.0- not including 20.0)
. . .
200 19.9 Unacceptable [6.0- not including 20.0)
201 20.0 Unacceptable [6.0- not including 20.0)
202 20.1 Possible entry error [20 and above]
The formatted results for observation 35 and 36 are correct, but not for observations 60 and 201.
This worked OK in 9.2 but is messing up in 9.3. Any clues or explanation? Did I code the format ranges correctly?
Thanks!
streamfisher,
It's likely you are running into a precision issue, caused by the DO loop. Incrementing by 0.1 doesn't give you an exact value in a binary system. Try this DO loop instead, and see if the problem is still there:
do i=6, 20;
This will give you 6 exactly, and 20 exactly, and should test properly using your format.
Good luck.
There is nothing wrong with Proc Format. The issue you're observing is caused by numeric precision and how computer store values. Run below code and look at the result and you will understand.
There have been several discussions with explanations around numeric precision here in the forums and there are also several SAS docs and papers about it, eg: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695157.htm
(you'll find more by Googling "site:support.sas.com numeric precision").
The question one could ask: Should the Do Loop round the incremented value for cases where the value being incremented does not have a higher precision than the increment? Not sure if there has been a change of how a do loop increments between SAS9.2 and SAS9.3 - but I rather doubt it.
PROC FORMAT;
   VALUE CRLIMIT
      . = 'Test not performed'
      Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
      3.5 -<6.0 = 'Elevated [3.5- not including 6.0)'
      6.0 -<20.0 = 'Unacceptable [6.0- not including 20.0)'
      20.0- HIGH =  'Possible entry error [20 and above]';
RUN;
DATA ONE;
   do i = 0 to 21 by 0.1;
      j= round(i,0.01);
      k= i;
      format k best32.;
      OUTPUT;
   end;
RUN;
PROC PRINT DATA = ONE;
   VAR k i j;
   FORMAT j CRLIMIT.;
RUN;
Thanks to Patrick and Astounding for the quick and helpful responses. I'll use the ROUND function and add that to my teaching slides.
I originally posted, and since deleted, a totally incorrect response. I agree with my colleagues that you are facing a numeric precision problem.
One way around that it is to fuzz the number, but use your proc format statement exactly as you had written it. e.g.:
DATA ONE;
do i = 0 to 21 by 0.1;
j= fuzz(i);
OUTPUT;
end;
RUN;
I am positive the other posters are correct about the problem. Another option is the FUZZ option on the VALUE statement in PROC FORMAT, but in your situation I don't think you want to use it. Right now, your format is acting correctly, it's your data values that are flawed.
Tom
Tom,
I've never used the fuzz option in proc format but, since you mentioned it, decided to play around with it a bit.
Can you explain why you would suggest not using it? The following appears to work correctly, but I've been bitten by numeric precision often enough that I wouldn't recommend it unless someone can verify that it is producing the right results for the right reasons:
PROC FORMAT;
VALUE CRLIMIT (fuzz=0.0000000001)
. = 'Test not performed'
Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'
6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'
20.0- HIGH = 'Possible entry error [20 and above]';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= i;
OUTPUT;
end;
RUN;
PROC PRINT DATA = ONE;
VAR i j;
FORMAT j CRLIMIT.;
RUN;
Hi, Art
My thinking was that since the OP's boundaries are integers, which are represented exactly in SAS numeric values, he would be better off not using FUZZ; 6 exactly or 20 exactly will always fall into the correct format range, but if he uses FUZZ 5 point something or 19 point something might incorrectly fall into the 6 or 20 range. The original post concerned a problem with the data values, nothing to do with the formats.
However, your asking the question is forcing me to do some de-lazied thinking about this problem. First of all, I discovered that if you don't specify FUZZ, SAS uses a default of 1E−12. Then I modified your test slightly to the following (j, k, and l are reducing amounts of fuzz):
PROC FORMAT;
VALUE CRLIMIT_FZ (fuzz=0.0000000001)
. = 'Test not performed'
Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -<6 = 'Elevated [3.5- not including 6.0)'
6 -<20 = 'Unacceptable [6.0- not including 20.0)'
20 - HIGH = 'Possible entry error [20 and above]';
VALUE CRLIMIT_NOFZ
. = 'Test not performed'
Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -<6 = 'Elevated [3.5- not including 6.0)'
6 -<20 = 'Unacceptable [6.0- not including 20.0)'
20 - HIGH = 'Possible entry error [20 and above]';
VALUE CRLIMIT_FZZERO (fuzz=0)
. = 'Test not performed'
Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -<6 = 'Elevated [3.5- not including 6.0)'
6 -<20 = 'Unacceptable [6.0- not including 20.0)'
20 - HIGH = 'Possible entry error [20 and above]';
RUN;
DATA ONE;
i = 19; j= i; k = i; l = i; OUTPUT;
i = 19.9999999999995; j= i; k = i; l = i; OUTPUT;
i = 19.9999999999996; j= i; k = i; l = i; OUTPUT;
i = 19.9999999999997; j= i; k = i; l = i; OUTPUT;
i = 19.9999999999998; j= i; k = i; l = i; OUTPUT;
i = 19.9999999999999; j= i; k = i; l = i; OUTPUT;
i = 20; j= i; k = i; l = i; OUTPUT;
i = 20.0000000000001; j= i; k = i; l = i; OUTPUT;
i = 21; j= i; k = i; l = i; OUTPUT;
RUN;
PROC PRINT DATA = ONE;
VAR i j k l;
FORMAT i BEST16. j CRLIMIT_FZ. k CRLIMIT_NOFZ. l CRLIMIT_FZZERO.;
RUN;
which produces the following results:
Obs i j k l
1 19 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
2 19.9999999999995 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
3 19.9999999999996 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
4 19.9999999999997 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
5 19.9999999999998 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
6 19.9999999999999 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)
7 20 Possible entry error [20 and above] Possible entry error [20 and above] Possible entry error [20 and above]
8 20.0000000000001 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Possible entry error [20 and above]
9 21 Possible entry error [20 and above] Possible entry error [20 and above] Possible entry error [20 and above]
An erroneous result for obs 8. I'm now going to suggest that in formats using the -< or <- notation, an explicit FUZZ=0 should be used, as otherwise the ranges overlap.
Thanks for the question; this was really interesting!
Tom
Just to finish this one off, I corresponded with SAS Tech Support. It turns out that as the example above shows, if there is possible overlap in the ranges due to the FUZZ value, the results can be unpredictable. In cases of continuous ranges, FUZZ=0 is the best choice.
Tom
I've run into this problem myself and it causes major reconciliation problems when you use "less than" or "greater than" ranges. My take on it is that a combination of default FUZZ and "less than" for example can have values that are genuinely just less than the boundary being arbitrarily moved up into the next highest range. In my case I was doing financial reporting and the impact when you are dealing with billions of dollars is extremely material! As Tom has recommended using FUZZ = 0 resolves the problem.
I don't think that it does! With fuzz=0 6.0 outputs as the wrong category. The only combination I found that worked correctly was:
PROC FORMAT;
VALUE CRLIMIT (fuzz=0.0000000001)
. = 'Test not performed'
Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'
6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'
20.0- HIGH = 'Possible entry error [20 and above]';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= i;
OUTPUT;
end;
RUN;
hI,
Is this not the same Fuzz used by Tom in the above post and got a wrong result at obs 8.????
Regards
No, I don't think so. The original question was about integers that couldn't be accurately identified as floating point numbers.
Tom's example was using extreme fractions that pushed the numeric precision limitations.
Interesting - just as well I'm not trying to format 6.0!
Try this though - another way of fixing it:
PROC FORMAT;
VALUE CRLIMIT (fuzz=0)
. = 'Test not performed'
Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'
3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'
6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'
20.0- HIGH = 'Possible entry error [20 and above]';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= round(i,0.1);
OUTPUT;
end;
RUN;
proc print data = one;
format j crlimit.;
run;
 
No, I don't think there's any difference. From the attached code, it appears that numbers between around 4.7 and 18.8 can't accurately represent an "x.1" in floating point, therefore the 5.9, 6.0, and 6.1 are problems, while the 3.4, 3.5, 3.6 and 19.9, 20.0, and 20.1 are fine. Yes, the problem is that the .1 loop increment causes problems, but it's compounded by the uncertain results of using a non-zero fuzz.
Here's your example, Art, with some additional formats:
PROC FORMAT;
VALUE fmta (fuzz=0)
. = 'Missing'
Low -3.499999999 = '0 - not including 3.5'
3.5 -5.999999999 = '3.5 - not including 6'
6.0 -19.99999999 = '6.0 - not including 20'
20.0- HIGH = '20 and above';
VALUE fmtb
. = 'Missing'
Low -3.499999999 = '0 - not including 3.5'
3.5 -5.999999999 = '3.5 - not including 6'
6.0 -19.99999999 = '6.0 - not including 20'
20.0- HIGH = '20 and above';
VALUE fmtc (fuzz=0.0000000001)
. = 'Missing'
Low -3.499999999 = '0 - not including 3.5'
3.5 -5.999999999 = '3.5 - not including 6'
6.0 -19.99999999 = '6.0 - not including 20'
20.0- HIGH = '20 and above';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= i;
k= i;
l= i;
format i best19.15;
format j fmta.;
format k fmtb.;
format l fmtc.;
OUTPUT;
end;
RUN;
Note that the "nominal" 6.0 is a problem with the fuzz=0, because it is actually less than the integer 6.
Now in another example using the "less than but not equal to" format syntax:
PROC FORMAT;
VALUE fmta (fuzz=0)
. = 'Missing'
Low -< 3.5 = '0 - not including 3.5'
3.5 -< 6 = '3.5 - not including 6'
6 -< 20 = '6.0 - not including 20'
20 - HIGH = '20 and above';
VALUE fmtb
. = 'Missing'
Low -< 3.5 = '0 - not including 3.5'
3.5 -< 6 = '3.5 - not including 6'
6 -< 20 = '6.0 - not including 20'
20 - HIGH = '20 and above';
VALUE fmtc (fuzz=0.0000000001)
. = 'Missing'
Low -< 3.5 = '0 - not including 3.5'
3.5 -< 6 = '3.5 - not including 6'
6 -< 20 = '6.0 - not including 20'
20 - HIGH = '20 and above';
RUN;
DATA ONE;
do i = 0 to 21 by 0.1;
j= i;
k= i;
l= i;
format i best19.15;
format j fmta.;
format k fmtb.;
format l fmtc.;
OUTPUT;
end;
RUN;
The 20 results are problematic with the fuzzed formats.
It's my contention that this format syntax combined with fuzz=0 is the only combination that accurately represents all of the numbers. But I'm sure this isn't over!
Tom
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
