Hello
What is the way to write 1/3 (one third) in proc format?
proc format;
value Fmt
-2,0-<1/3='A'
1/3-<2/3='B'
2/3-1='C'
-1,1<-high='D'
;
Run;
There is no way to write 1/3 exactly. You need to use digits and a decimal point, and there is no exact representation. You can approximate, for example:
0.33333333333 - < 0.66666666667 = 'B'
Also, you may be able to apply the FUZZ option to eliminate gaps between your ranges for A, B, and C.
There is no way to write 1/3 exactly. You need to use digits and a decimal point, and there is no exact representation. You can approximate, for example:
0.33333333333 - < 0.66666666667 = 'B'
Also, you may be able to apply the FUZZ option to eliminate gaps between your ranges for A, B, and C.
You can make your own CNTLIN style dataset and pass it to PROC FORMAT.
data thirds;
fmtname='FMT';
type='N';
hlo=' ';
sexcl='N';
eexcl='Y';
start=-2; end=1/3; label='A'; output;
start=end; end=2/3; label='B'; output;
eexcl='N';start=end; end='1'; label='C'; output;
sexcl='Y';hlo='H';start=end; end=.; label='D'; output;
run;
proc format cntlin=thirds ;
run;
Or you can use some macro code to generate a reasonable approximation of 1/3 as text.
proc format fmtlib cntlout=formats2;
value fmt
-2-< %sysfunc(putn(1/3,best15.))='A'
%sysfunc(putn(1/3,best15.))-<%sysfunc(putn(2/3,best15.))='B'
%sysfunc(putn(2/3,best15.))-1='C'
1<-high='D'
;
run;
For some strange reason the BEST16 format generates 2/3 with the last digit as 6 instead of 7. Probably something to do with precision of storing digital fractions in binary floating format. So using BEST15. as the format there will be minor difference in the definitions of the two formats if you compare the definitions written to a CNTLOUT dataset.
The COMPARE Procedure Comparison of WORK.FORMATS with WORK.FORMATS2 (Method=EXACT) Values Comparison Summary Number of Variables Compared with All Observations Equal: 19. Number of Variables Compared with Some Observations Unequal: 2. Total Number of Values which Compare Unequal: 4. Maximum Difference: 0. Variables with Unequal Values Variable Type Len Label Ndif MaxDif START CHAR 16 Starting value for format 2 END CHAR 16 Ending value for format 2 Value Comparison Results for Variables __________________________________________________________ || Starting value for format || Base Value Compare Value Obs || START START ________ || ________________ ________________ || 2 || 0.33333333333333 0.3333333333333 3 || 0.66666666666667 0.6666666666667 __________________________________________________________
Thanks, @Tom. Interesting case. The CNTLIN style dataset achieves the maximum possible accuracy by literally using 1/3 and 2/3 in the definition. I would add
fuzz=0;
as recommended in the documentation of the FUZZ= option. Otherwise, fuzzing (with the default fuzz=1E-12) would interfere with the intended format definition close to the interval borders.
The exact same format could be created with @Astounding's approach:
proc format;
value fmt (fuzz=0)
-2 -< 0.3333333333333333 = 'A'
0.3333333333333333 -< 0.6666666666666666 = 'B'
0.6666666666666666 - 1 = 'C'
1 <- high = 'D'
;
run;
but the correct choice of the numeric literals is crucial (and possibly platform-dependent!). Windows SAS 9.4M5 (regardless of system option DECIMALCONV) yields:
792 data _null_; 793 x=1/3; 794 y=2/3; 795 put (x y) (=hex16.); 796 x=0.3333333333333333; 797 y=0.6666666666666666; 798 put (x y) (=hex16.); 799 run; x=3FD5555555555555 y=3FE5555555555555 x=3FD5555555555555 y=3FE5555555555555
Using any fewer than the above sixteen (!) decimal 3s or 6s would create different internal representations (and hence different formats), as would rounding y to 0.6666666666666667. I think none of the common numeric formats (such as w.d, BESTw., Ew., etc.) would create exactly those literals.
It's tricky that neither FMTLIB output nor the CNTLOUT dataset are precise enough to reveal the difference between the exact format (as defined above) and a format defined with slightly different literals. In the case of the CNTLOUT dataset (which uses character variables START and END of length 16 containing rounded decimal representations of the fractions in question) this means that feeding the dataset back with the CNTLIN option creates a different format! See the log below:
896 data _null_; 897 x=1/3; 898 y=2/3; 899 put (x y) (=fmt.); 900 run; x=B y=C NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.10 seconds 901 proc format cntlout=fmtds; 902 select fmt; 903 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.06 seconds cpu time 0.06 seconds NOTE: The data set WORK.FMTDS has 4 observations and 21 variables. 904 proc format cntlin=fmtds; NOTE: Format FMT is already on the library WORK.FORMATS. NOTE: Format FMT has been output. 905 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.07 seconds cpu time 0.07 seconds NOTE: There were 4 observations read from the data set WORK.FMTDS. 906 data _null_; 907 x=1/3; 908 y=2/3; 909 put (x y) (=fmt.); 910 run; x=B y=B
In practice, special care should be taken that the values to be formatted are using the appropriate internal representations to ensure correct classification of borderline cases.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.