BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CEdward
Fluorite | Level 6

Hello forum members,

 

This is a rather trivial task, but I didn't see how I would be able to do so. How would I see the levels of a categorical label without their labels? 

For example, say I had an income variable with 3 levels and I wanted to know the 'number' that corresponds to each'. How would I do this?

 

Income

High

Medium 

Low

 

Income 

3

2

1

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @CEdward,

 

So you have a SAS dataset (let's assume it is WORK.HAVE) with a user-defined format being permanently associated with variable Income and for some reason you want to know the internal values behind the labels "High," etc.

 

Common approaches include:

  1. PROC FORMAT using the (default) FMTLIB option.
    This will provide you with the most complete information because it is not limited to your current dataset. For example, it might reveal that there's actually a fourth category "Very high" that just doesn't occur in dataset HAVE. If you know the name of the format (say, INCFMT.) and that it is stored in the format catalog WORK.FORMATS, the code is very simple:
    proc format;
    select incfmt;
    run;
    The output shows internal and formatted values side by side:
    ----------------------------------------------------------------------------
    |       FORMAT NAME: INCFMT   LENGTH:    6   NUMBER OF VALUES:    3        |
    |   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:   6  FUZZ: STD       |
    |--------------------------------------------------------------------------|
    |START           |END             |LABEL  (VER. V7|V8   14DEC2020:10:40:20)|
    |----------------+----------------+----------------------------------------|
    |               1|               1|Low                                     |
    |               2|               2|Medium                                  |
    |               3|               3|High                                    |
    ----------------------------------------------------------------------------
    The format name can be found in PROC CONTENTS output for dataset HAVE. In certain situations it can be more tricky to find out the format catalog, though: A complex project environment might involve several format catalogs which could even contain different formats with the same name. Then you would need to look at the value of the FMTSEARCH option, determine the first format catalog in the search order that contains the format and use the LIB= option of the PROC FORMAT statement above to specify it. Or resort to one of the other approaches.

  2. PROC FREQ with a FORMAT statement that temporarily deactivates the format.
    proc freq data=have;
    format _all_; /* or, more specifically: format income; */
    tables income;
    run;
    Unlike the first approach, this step processes the whole dataset, which may be costly if HAVE is very large. Other disadvantages are: The output will not show the formatted values (so you may want to repeat the step without the FORMAT statement) and the output can be very long and unwieldy if variable Income turns out to be continuous (with lots of distinct values such as 65,432.10). The latter two drawbacks are avoided by the third approach.

  3. PROC MEANS with a CLASS statement.
    proc means data=have;
    class income;
    var income;
    run;
    This approach is limited to numeric variables. Due to the CLASS statement, the formatted values will form the categories for which the (default) statistics are computed from the internal values. If Income is really categorical and each formatted value corresponds to only one internal value, mean, minimum and maximum will be equal to that internal value.

  4. Avoid the need for internal values by using the VVALUE (or VVALUEX) function.
    For example, the subset of all observations in HAVE with "High" Income can be selected without knowing the underlying internal value (or range):
    data highinc;
    set have;
    if vvalue(income)='High';
    run;

View solution in original post

3 REPLIES 3
Norman21
Lapis Lazuli | Level 10

You can include the information in a format:

 

proc format;
value IncomeFmt
      1 = "Low (1)"
      2 = "Medium (2)" 
      3 = "High (3)"
      other = " ";
run;
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom
Super User Tom
Super User

You have left out a lot of details.  Is this "categorical" variable numeric or character? 

If character then how to you want numbers assigned to the values? Do you just want to sort the values alphabetically and then assign them numbers?

 

If numeric then I assume there is a format attached to the variable.  To see the raw numbers just remove the format.  Or you could look at this tool for generating new versions of the formats where the label includes the raw value.  https://github.com/sasutils/macros/blob/master/cfmtgen.sas

 

FreelanceReinh
Jade | Level 19

Hello @CEdward,

 

So you have a SAS dataset (let's assume it is WORK.HAVE) with a user-defined format being permanently associated with variable Income and for some reason you want to know the internal values behind the labels "High," etc.

 

Common approaches include:

  1. PROC FORMAT using the (default) FMTLIB option.
    This will provide you with the most complete information because it is not limited to your current dataset. For example, it might reveal that there's actually a fourth category "Very high" that just doesn't occur in dataset HAVE. If you know the name of the format (say, INCFMT.) and that it is stored in the format catalog WORK.FORMATS, the code is very simple:
    proc format;
    select incfmt;
    run;
    The output shows internal and formatted values side by side:
    ----------------------------------------------------------------------------
    |       FORMAT NAME: INCFMT   LENGTH:    6   NUMBER OF VALUES:    3        |
    |   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:   6  FUZZ: STD       |
    |--------------------------------------------------------------------------|
    |START           |END             |LABEL  (VER. V7|V8   14DEC2020:10:40:20)|
    |----------------+----------------+----------------------------------------|
    |               1|               1|Low                                     |
    |               2|               2|Medium                                  |
    |               3|               3|High                                    |
    ----------------------------------------------------------------------------
    The format name can be found in PROC CONTENTS output for dataset HAVE. In certain situations it can be more tricky to find out the format catalog, though: A complex project environment might involve several format catalogs which could even contain different formats with the same name. Then you would need to look at the value of the FMTSEARCH option, determine the first format catalog in the search order that contains the format and use the LIB= option of the PROC FORMAT statement above to specify it. Or resort to one of the other approaches.

  2. PROC FREQ with a FORMAT statement that temporarily deactivates the format.
    proc freq data=have;
    format _all_; /* or, more specifically: format income; */
    tables income;
    run;
    Unlike the first approach, this step processes the whole dataset, which may be costly if HAVE is very large. Other disadvantages are: The output will not show the formatted values (so you may want to repeat the step without the FORMAT statement) and the output can be very long and unwieldy if variable Income turns out to be continuous (with lots of distinct values such as 65,432.10). The latter two drawbacks are avoided by the third approach.

  3. PROC MEANS with a CLASS statement.
    proc means data=have;
    class income;
    var income;
    run;
    This approach is limited to numeric variables. Due to the CLASS statement, the formatted values will form the categories for which the (default) statistics are computed from the internal values. If Income is really categorical and each formatted value corresponds to only one internal value, mean, minimum and maximum will be equal to that internal value.

  4. Avoid the need for internal values by using the VVALUE (or VVALUEX) function.
    For example, the subset of all observations in HAVE with "High" Income can be selected without knowing the underlying internal value (or range):
    data highinc;
    set have;
    if vvalue(income)='High';
    run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 5154 views
  • 1 like
  • 4 in conversation