Hello @CEdward,
So you have a SAS dataset (let's assume it is WORK.HAVE) with a user-defined format being permanently associated with variable Income and for some reason you want to know the internal values behind the labels "High," etc.
Common approaches include:
PROC FORMAT using the (default) FMTLIB option. This will provide you with the most complete information because it is not limited to your current dataset. For example, it might reveal that there's actually a fourth category "Very high" that just doesn't occur in dataset HAVE. If you know the name of the format (say, INCFMT.) and that it is stored in the format catalog WORK.FORMATS, the code is very simple:
proc format;
select incfmt;
run;
The output shows internal and formatted values side by side:
----------------------------------------------------------------------------
| FORMAT NAME: INCFMT LENGTH: 6 NUMBER OF VALUES: 3 |
| MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH: 6 FUZZ: STD |
|--------------------------------------------------------------------------|
|START |END |LABEL (VER. V7|V8 14DEC2020:10:40:20)|
|----------------+----------------+----------------------------------------|
| 1| 1|Low |
| 2| 2|Medium |
| 3| 3|High |
----------------------------------------------------------------------------
The format name can be found in PROC CONTENTS output for dataset HAVE. In certain situations it can be more tricky to find out the format catalog, though: A complex project environment might involve several format catalogs which could even contain different formats with the same name. Then you would need to look at the value of the FMTSEARCH option, determine the first format catalog in the search order that contains the format and use the LIB= option of the PROC FORMAT statement above to specify it. Or resort to one of the other approaches.
PROC FREQ with a FORMAT statement that temporarily deactivates the format.
proc freq data=have;
format _all_; /* or, more specifically: format income; */
tables income;
run;
Unlike the first approach, this step processes the whole dataset, which may be costly if HAVE is very large. Other disadvantages are: The output will not show the formatted values (so you may want to repeat the step without the FORMAT statement) and the output can be very long and unwieldy if variable Income turns out to be continuous (with lots of distinct values such as 65,432.10). The latter two drawbacks are avoided by the third approach.
PROC MEANS with a CLASS statement.
proc means data=have;
class income;
var income;
run;
This approach is limited to numeric variables. Due to the CLASS statement, the formatted values will form the categories for which the (default) statistics are computed from the internal values. If Income is really categorical and each formatted value corresponds to only one internal value, mean, minimum and maximum will be equal to that internal value.
Avoid the need for internal values by using the VVALUE (or VVALUEX) function. For example, the subset of all observations in HAVE with "High" Income can be selected without knowing the underlying internal value (or range):
data highinc;
set have;
if vvalue(income)='High';
run;
... View more