Hello forum members,
This is a rather trivial task, but I didn't see how I would be able to do so. How would I see the levels of a categorical label without their labels?
For example, say I had an income variable with 3 levels and I wanted to know the 'number' that corresponds to each'. How would I do this?
Income
High
Medium
Low
Income
3
2
1
Hello @CEdward,
So you have a SAS dataset (let's assume it is WORK.HAVE) with a user-defined format being permanently associated with variable Income and for some reason you want to know the internal values behind the labels "High," etc.
Common approaches include:
proc format;
select incfmt;
run;
The output shows internal and formatted values side by side:
---------------------------------------------------------------------------- | FORMAT NAME: INCFMT LENGTH: 6 NUMBER OF VALUES: 3 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH: 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 14DEC2020:10:40:20)| |----------------+----------------+----------------------------------------| | 1| 1|Low | | 2| 2|Medium | | 3| 3|High | ----------------------------------------------------------------------------The format name can be found in PROC CONTENTS output for dataset HAVE. In certain situations it can be more tricky to find out the format catalog, though: A complex project environment might involve several format catalogs which could even contain different formats with the same name. Then you would need to look at the value of the FMTSEARCH option, determine the first format catalog in the search order that contains the format and use the LIB= option of the PROC FORMAT statement above to specify it. Or resort to one of the other approaches.
proc freq data=have;
format _all_; /* or, more specifically: format income; */
tables income;
run;
Unlike the first approach, this step processes the whole dataset, which may be costly if HAVE is very large. Other disadvantages are: The output will not show the formatted values (so you may want to repeat the step without the FORMAT statement) and the output can be very long and unwieldy if variable Income turns out to be continuous (with lots of distinct values such as 65,432.10). The latter two drawbacks are avoided by the third approach.proc means data=have;
class income;
var income;
run;
This approach is limited to numeric variables. Due to the CLASS statement, the formatted values will form the categories for which the (default) statistics are computed from the internal values. If Income is really categorical and each formatted value corresponds to only one internal value, mean, minimum and maximum will be equal to that internal value.data highinc;
set have;
if vvalue(income)='High';
run;
You can include the information in a format:
proc format;
value IncomeFmt
1 = "Low (1)"
2 = "Medium (2)"
3 = "High (3)"
other = " ";
run;
You have left out a lot of details. Is this "categorical" variable numeric or character?
If character then how to you want numbers assigned to the values? Do you just want to sort the values alphabetically and then assign them numbers?
If numeric then I assume there is a format attached to the variable. To see the raw numbers just remove the format. Or you could look at this tool for generating new versions of the formats where the label includes the raw value. https://github.com/sasutils/macros/blob/master/cfmtgen.sas
Hello @CEdward,
So you have a SAS dataset (let's assume it is WORK.HAVE) with a user-defined format being permanently associated with variable Income and for some reason you want to know the internal values behind the labels "High," etc.
Common approaches include:
proc format;
select incfmt;
run;
The output shows internal and formatted values side by side:
---------------------------------------------------------------------------- | FORMAT NAME: INCFMT LENGTH: 6 NUMBER OF VALUES: 3 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH: 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 14DEC2020:10:40:20)| |----------------+----------------+----------------------------------------| | 1| 1|Low | | 2| 2|Medium | | 3| 3|High | ----------------------------------------------------------------------------The format name can be found in PROC CONTENTS output for dataset HAVE. In certain situations it can be more tricky to find out the format catalog, though: A complex project environment might involve several format catalogs which could even contain different formats with the same name. Then you would need to look at the value of the FMTSEARCH option, determine the first format catalog in the search order that contains the format and use the LIB= option of the PROC FORMAT statement above to specify it. Or resort to one of the other approaches.
proc freq data=have;
format _all_; /* or, more specifically: format income; */
tables income;
run;
Unlike the first approach, this step processes the whole dataset, which may be costly if HAVE is very large. Other disadvantages are: The output will not show the formatted values (so you may want to repeat the step without the FORMAT statement) and the output can be very long and unwieldy if variable Income turns out to be continuous (with lots of distinct values such as 65,432.10). The latter two drawbacks are avoided by the third approach.proc means data=have;
class income;
var income;
run;
This approach is limited to numeric variables. Due to the CLASS statement, the formatted values will form the categories for which the (default) statistics are computed from the internal values. If Income is really categorical and each formatted value corresponds to only one internal value, mean, minimum and maximum will be equal to that internal value.data highinc;
set have;
if vvalue(income)='High';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.