Re: Strange results with format value ranges. Any help, please?

streamfisher · Posted 10-08-2012 10:34 AM

Hi, all,

I'm using 9.3 and can't understand why I'm getting the following results for this code for a PROC FORMAT using the inequality operator pair, -< (up to but not including...) :

PROC FORMAT;

VALUE CRLIMIT

. = 'Test not performed'

Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -<6.0 = 'Elevated [3.5- not including 6.0)'

6.0 -<20.0 = 'Unacceptable [6.0- not including 20.0)'

20.0- HIGH = 'Possible entry error [20 and above]';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= i;

OUTPUT;

end;

RUN;

PROC PRINT DATA = ONE;

VAR i j;

FORMAT j CRLIMIT.;

RUN;

And the output (editted) shows the following:

OBS i j

. . .

34 3.3 Normal/acceptable (Up to/ not including 3.5)

35 3.4 Normal/acceptable (Up to/ not including 3.5)

36 3.5 Elevated [3.5- not including 6.0)

37 3.6 Elevated [3.5- not including 6.0)

. . .

60 5.9 Elevated [3.5- not including 6.0)

61 6.0 Elevated [3.5- not including 6.0)

62 6.1 Unacceptable [6.0- not including 20.0)

. . .

200 19.9 Unacceptable [6.0- not including 20.0)

201 20.0 Unacceptable [6.0- not including 20.0)

202 20.1 Possible entry error [20 and above]

The formatted results for observation 35 and 36 are correct, but not for observations 60 and 201.

This worked OK in 9.2 but is messing up in 9.3. Any clues or explanation? Did I code the format ranges correctly?

Thanks!

Astounding · Posted 10-08-2012 11:12 AM

streamfisher,

It's likely you are running into a precision issue, caused by the DO loop. Incrementing by 0.1 doesn't give you an exact value in a binary system. Try this DO loop instead, and see if the problem is still there:

do i=6, 20;

This will give you 6 exactly, and 20 exactly, and should test properly using your format.

Good luck.

Patrick · Posted 10-08-2012 11:14 AM

There is nothing wrong with Proc Format. The issue you're observing is caused by numeric precision and how computer store values. Run below code and look at the result and you will understand.

There have been several discussions with explanations around numeric precision here in the forums and there are also several SAS docs and papers about it, eg: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695157.htm

(you'll find more by Googling "site:support.sas.com numeric precision").

The question one could ask: Should the Do Loop round the incremented value for cases where the value being incremented does not have a higher precision than the increment? Not sure if there has been a change of how a do loop increments between SAS9.2 and SAS9.3 - but I rather doubt it.

PROC FORMAT;
   VALUE CRLIMIT
      . = 'Test not performed'
      Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'
      3.5 -<6.0 = 'Elevated [3.5- not including 6.0)'
      6.0 -<20.0 = 'Unacceptable [6.0- not including 20.0)'
      20.0- HIGH = 'Possible entry error [20 and above]';
RUN;

DATA ONE;
   do i = 0 to 21 by 0.1;
      j= round(i,0.01);
      k= i;
      format k best32.;
      OUTPUT;
   end;
RUN;

PROC PRINT DATA = ONE;
VAR k i j;
FORMAT j CRLIMIT.;
RUN;

streamfisher · Posted 10-08-2012 12:09 PM

Thanks to Patrick and Astounding for the quick and helpful responses. I'll use the ROUND function and add that to my teaching slides.

art297 · Posted 10-08-2012 11:54 AM

I originally posted, and since deleted, a totally incorrect response. I agree with my colleagues that you are facing a numeric precision problem.

One way around that it is to fuzz the number, but use your proc format statement exactly as you had written it. e.g.:

DATA ONE;

do i = 0 to 21 by 0.1;

j= fuzz(i);

OUTPUT;

end;

RUN;

TomKari · Posted 10-08-2012 12:05 PM

I am positive the other posters are correct about the problem. Another option is the FUZZ option on the VALUE statement in PROC FORMAT, but in your situation I don't think you want to use it. Right now, your format is acting correctly, it's your data values that are flawed.

Tom

art297 · Posted 10-08-2012 12:27 PM

Tom,

I've never used the fuzz option in proc format but, since you mentioned it, decided to play around with it a bit.

Can you explain why you would suggest not using it? The following appears to work correctly, but I've been bitten by numeric precision often enough that I wouldn't recommend it unless someone can verify that it is producing the right results for the right reasons:

PROC FORMAT;

VALUE CRLIMIT (fuzz=0.0000000001)

. = 'Test not performed'

Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'

6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'

20.0- HIGH = 'Possible entry error [20 and above]';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= i;

OUTPUT;

end;

RUN;

PROC PRINT DATA = ONE;

VAR i j;

FORMAT j CRLIMIT.;

RUN;

TomKari · Posted 10-08-2012 01:38 PM

Hi, Art

My thinking was that since the OP's boundaries are integers, which are represented exactly in SAS numeric values, he would be better off not using FUZZ; 6 exactly or 20 exactly will always fall into the correct format range, but if he uses FUZZ 5 point something or 19 point something might incorrectly fall into the 6 or 20 range. The original post concerned a problem with the data values, nothing to do with the formats.

However, your asking the question is forcing me to do some de-lazied thinking about this problem. First of all, I discovered that if you don't specify FUZZ, SAS uses a default of 1E−12. Then I modified your test slightly to the following (j, k, and l are reducing amounts of fuzz):

PROC FORMAT;

VALUE CRLIMIT_FZ (fuzz=0.0000000001)

. = 'Test not performed'

Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -<6 = 'Elevated [3.5- not including 6.0)'

6 -<20 = 'Unacceptable [6.0- not including 20.0)'

20 - HIGH = 'Possible entry error [20 and above]';

VALUE CRLIMIT_NOFZ

. = 'Test not performed'

Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -<6 = 'Elevated [3.5- not including 6.0)'

6 -<20 = 'Unacceptable [6.0- not including 20.0)'

20 - HIGH = 'Possible entry error [20 and above]';

VALUE CRLIMIT_FZZERO (fuzz=0)

. = 'Test not performed'

Low -<3.5 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -<6 = 'Elevated [3.5- not including 6.0)'

6 -<20 = 'Unacceptable [6.0- not including 20.0)'

20 - HIGH = 'Possible entry error [20 and above]';

RUN;

DATA ONE;

i = 19; j= i; k = i; l = i; OUTPUT;

i = 19.9999999999995; j= i; k = i; l = i; OUTPUT;

i = 19.9999999999996; j= i; k = i; l = i; OUTPUT;

i = 19.9999999999997; j= i; k = i; l = i; OUTPUT;

i = 19.9999999999998; j= i; k = i; l = i; OUTPUT;

i = 19.9999999999999; j= i; k = i; l = i; OUTPUT;

i = 20; j= i; k = i; l = i; OUTPUT;

i = 20.0000000000001; j= i; k = i; l = i; OUTPUT;

i = 21; j= i; k = i; l = i; OUTPUT;

RUN;

PROC PRINT DATA = ONE;

VAR i j k l;

FORMAT i BEST16. j CRLIMIT_FZ. k CRLIMIT_NOFZ. l CRLIMIT_FZZERO.;

RUN;

which produces the following results:

Obs i j k l

1 19 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

2 19.9999999999995 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

3 19.9999999999996 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

4 19.9999999999997 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

5 19.9999999999998 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

6 19.9999999999999 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0)

7 20 Possible entry error [20 and above] Possible entry error [20 and above] Possible entry error [20 and above]

8 20.0000000000001 Unacceptable [6.0- not including 20.0) Unacceptable [6.0- not including 20.0) Possible entry error [20 and above]

9 21 Possible entry error [20 and above] Possible entry error [20 and above] Possible entry error [20 and above]

An erroneous result for obs 8. I'm now going to suggest that in formats using the -< or <- notation, an explicit FUZZ=0 should be used, as otherwise the ranges overlap.

Thanks for the question; this was really interesting!

Tom

TomKari · Posted 10-16-2012 10:51 AM

Just to finish this one off, I corresponded with SAS Tech Support. It turns out that as the example above shows, if there is possible overlap in the ranges due to the FUZZ value, the results can be unpredictable. In cases of continuous ranges, FUZZ=0 is the best choice.

Tom

SASKiwi · Posted 10-17-2012 03:44 PM

I've run into this problem myself and it causes major reconciliation problems when you use "less than" or "greater than" ranges. My take on it is that a combination of default FUZZ and "less than" for example can have values that are genuinely just less than the boundary being arbitrarily moved up into the next highest range. In my case I was doing financial reporting and the impact when you are dealing with billions of dollars is extremely material! As Tom has recommended using FUZZ = 0 resolves the problem.

art297 · Posted 10-17-2012 05:38 PM

I don't think that it does! With fuzz=0 6.0 outputs as the wrong category. The only combination I found that worked correctly was:

PROC FORMAT;

VALUE CRLIMIT (fuzz=0.0000000001)

. = 'Test not performed'

Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'

6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'

20.0- HIGH = 'Possible entry error [20 and above]';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= i;

OUTPUT;

end;

RUN;

robertrao · Posted 10-17-2012 07:01 PM

hI,

Is this not the same Fuzz used by Tom in the above post and got a wrong result at obs 8.????

Regards

art297 · Posted 10-17-2012 07:09 PM

No, I don't think so. The original question was about integers that couldn't be accurately identified as floating point numbers.

Tom's example was using extreme fractions that pushed the numeric precision limitations.

SASKiwi · Posted 10-17-2012 07:58 PM

Interesting - just as well I'm not trying to format 6.0!

Try this though - another way of fixing it:

PROC FORMAT;

VALUE CRLIMIT (fuzz=0)

. = 'Test not performed'

Low -3.499999999 = 'Normal/acceptable (Up to/ not including 3.5)'

3.5 -5.999999999 = 'Elevated [3.5- not including 6.0)'

6.0 -19.99999999 = 'Unacceptable [6.0- not including 20.0)'

20.0- HIGH = 'Possible entry error [20 and above]';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= round(i,0.1);

OUTPUT;

end;

RUN;

proc print data = one;

format j crlimit.;

run;

TomKari · Posted 10-17-2012 09:55 PM

No, I don't think there's any difference. From the attached code, it appears that numbers between around 4.7 and 18.8 can't accurately represent an "x.1" in floating point, therefore the 5.9, 6.0, and 6.1 are problems, while the 3.4, 3.5, 3.6 and 19.9, 20.0, and 20.1 are fine. Yes, the problem is that the .1 loop increment causes problems, but it's compounded by the uncertain results of using a non-zero fuzz.

Here's your example, Art, with some additional formats:

PROC FORMAT;

VALUE fmta (fuzz=0)

. = 'Missing'

Low -3.499999999 = '0 - not including 3.5'

3.5 -5.999999999 = '3.5 - not including 6'

6.0 -19.99999999 = '6.0 - not including 20'

20.0- HIGH = '20 and above';

VALUE fmtb

. = 'Missing'

Low -3.499999999 = '0 - not including 3.5'

3.5 -5.999999999 = '3.5 - not including 6'

6.0 -19.99999999 = '6.0 - not including 20'

20.0- HIGH = '20 and above';

VALUE fmtc (fuzz=0.0000000001)

. = 'Missing'

Low -3.499999999 = '0 - not including 3.5'

3.5 -5.999999999 = '3.5 - not including 6'

6.0 -19.99999999 = '6.0 - not including 20'

20.0- HIGH = '20 and above';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= i;

k= i;

l= i;

format i best19.15;

format j fmta.;

format k fmtb.;

format l fmtc.;

OUTPUT;

end;

RUN;

Note that the "nominal" 6.0 is a problem with the fuzz=0, because it is actually less than the integer 6.

Now in another example using the "less than but not equal to" format syntax:

PROC FORMAT;

VALUE fmta (fuzz=0)

. = 'Missing'

Low -< 3.5 = '0 - not including 3.5'

3.5 -< 6 = '3.5 - not including 6'

6 -< 20 = '6.0 - not including 20'

20 - HIGH = '20 and above';

VALUE fmtb

. = 'Missing'

Low -< 3.5 = '0 - not including 3.5'

3.5 -< 6 = '3.5 - not including 6'

6 -< 20 = '6.0 - not including 20'

20 - HIGH = '20 and above';

VALUE fmtc (fuzz=0.0000000001)

. = 'Missing'

Low -< 3.5 = '0 - not including 3.5'

3.5 -< 6 = '3.5 - not including 6'

6 -< 20 = '6.0 - not including 20'

20 - HIGH = '20 and above';

RUN;

DATA ONE;

do i = 0 to 21 by 0.1;

j= i;

k= i;

l= i;

format i best19.15;

format j fmta.;

format k fmtb.;

format l fmtc.;

OUTPUT;

end;

RUN;

The 20 results are problematic with the fuzzed formats.

It's my contention that this format syntax combined with fuzz=0 is the only combination that accurately represents all of the numbers. But I'm sure this isn't over!

Tom

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!