Hi @qwertyzx and welcome to the SAS Support Communities!
@qwertyzx wrote:
I could also create this as a numeric variable with labels rather than as a character variable -- is that the preferred way?
In my experience this is the most common approach to this problem. In PROC FREQ (and several other procedures) you can use the ORDER= option to control the order of levels. But not all of the four values of this option are equally suitable:
ORDER=DATA depends on the order of observations in the input dataset (or view). In your example "Other" would need to occur after at least one appearance of each of the remaining levels. In practice, however, many datasets are sorted by key variables and you wouldn't want to change the sort order just for reporting purposes.
ORDER=FORMATTED requires that the formatted values have the desired order. Since "Other" is unlikely to be the last label in alphabetical order (as you've experienced), this option doesn't help you either.
ORDER=FREQ is also useless in your case, because "Other" is not necessarily the least frequent category.
ORDER=INTERNAL is the only remaining option and also the default (i.e., you don't need to specify it explicitly). So, if the internal values are defined in a way that the value corresponding to "Other" is the largest value (in the case of a numeric variable) or comes last alphabetically (in the case of a character variable), you're done. PROC FREQ displays (and groups) formatted values by default and you're free to assign virtually arbitrary format labels to the internal values. Additional benefits include: The internal values can be much shorter than the formatted values (so you can save disk space), the format labels can be changed without touching the data (which facilitates maintenance) and you can define different formats for different purposes (and apply them just in the PROC step).
If a procedure has a different default of the ORDER= option (e.g., PROC REPORT uses ORDER=FORMATTED), just specify ORDER=INTERNAL where appropriate.
As mentioned, the internal values can be either numeric or character, but the (alphabetical) sort order of character variables has its pitfalls (e.g., "2">"10") so that numeric values (preferably integers) are often easier to use.
... View more