I asked a similar question in a different thread for a single variable proc freq table here, but was curious if it could also be done with a multi-variable analysis.
Essentially, the title of the thread: is it possible to order a column in a multi-variable proc freq table by a separate variable not being analyzed? For example, take the following data:
| County_Name | Yes_No | County_Code |
| A | Yes | 1 |
| A | Yes | 1 |
| A | Yes | 1 |
| A | No | 1 |
| B | No | 2 |
| B | No | 2 |
| B | No | 2 |
| C | Yes | 5 |
| C | Yes | 5 |
| C | Yes | 5 |
| C | Yes | 5 |
| C | No | 5 |
| D | No | 4 |
| D | No | 4 |
| D | Yes | 4 |
| E | Yes | 3 |
| E | Yes | 3 |
| E | No | 3 |
| E | No | 3 |
| F | Yes | 6 |
| F | Yes | 6 |
| F | No | 6 |
| F | No | 6 |
Normally, if I made a proc freq table to count the frequency of yes/no's for each county, as so:
proc freq data=data;
tables County_Name*Yes_No;
run;
I would get an output table similar to the following:
| County_Name | Freq | ||
| Yes | No | Total | |
| A | 3 | 1 | 4 |
| B | 0 | 3 | 3 |
| C | 4 | 1 | 5 |
| D | 1 | 2 | 3 |
| E | 2 | 2 | 4 |
| F | 2 | 2 | 4 |
However, is it possible to produce a similar table, but have the County_Name variable ordered according to its corresponding County_Code value? So, a table that would look like this instead:
| County_Name | Freq | ||
| Yes | No | Total | |
| A | 3 | 1 | 4 |
| B | 0 | 3 | 3 |
| E | 2 | 2 | 4 |
| D | 1 | 2 | 3 |
| C | 4 | 1 | 5 |
| F | 2 | 2 | 4 |
I tried to do something similar to the single-variable solution, but instead got a larger cross-tabulation that gave a separate row for each combination, instead of having the yes/no frequency in their own columns. Something like this:
proc freq data=data;
tables County_Code*County_Name*Yes_No / list;
run;
| County_Code | County_Name | Yes_No | Freq |
| 1 | A | Yes | 3 |
| 1 | A | No | 1 |
| 2 | B | Yes | 0 |
| 2 | B | No | 3 |
| 3 | C | Yes | 4 |
| 3 | C | No | 1 |
| 4 | D | Yes | 1 |
| 4 | D | No | 2 |
| 5 | E | Yes | 2 |
| 5 | E | No | 2 |
| 6 | F | Yes | 2 |
| 6 | F | No | 2 |
If you provide some example data in the form of a working data set you might get working code.
One way would be to use Proc Format to create a custom format for the county code variable and apply the format to the code variable in proc freq.
Here is a brief example:
Proc format; value county_code 1 = 'A' 2 = 'B' 5 = 'C' 4 = 'D' 3 = 'E' 6 = 'F' ; run; data example; input County_code yes_no $; datalines; 1 Yes 1 No 3 Yes 5 No 5 Yes 2 Yes 6 No ; Proc freq data=example; table county_code*yes_no; format county_code county_code. ; run;
I didn't bother to put the county name variable in the example data set because the code value is all that is needed.
The format name does not have to be the same as the variable it just makes it easier to keep track of. Note the use of the period at the end of the format name in the format statement. That is how SAS knows that is the format and not a variable.
Formats are a very powerful tool because additional formats can be used with the same data set and variable to get different analysis. For example with something like a county code variable I could create a format that assigns the codes to something like geographic management regions, my state has 6 or 7 depending on whether you are working with Public Health Districts or Education regions for example. Or by some other characteristic such as population size group or density, mean income or just about anything. Then the only change to an analysis is using the format. Proc format can use a data set with codes and such descriptions as well to avoid writing a lot of code such as for Texas which has over 250 counties.
If the codes are character and not numeric the format name in the proc format code needs be prefaced with a $ and when used in a format statement. This is needed because you could have separate formats for a numeric and character values.
data have; infile cards expandtabs truncover; input (County_Name Yes_No County_Code) ($); cards; A Yes 1 A Yes 1 A Yes 1 A No 1 B No 2 B No 2 B No 2 C Yes 5 C Yes 5 C Yes 5 C Yes 5 C No 5 D No 4 D No 4 D Yes 4 E Yes 3 E Yes 3 E No 3 E No 3 F Yes 6 F Yes 6 F No 6 F No 6 ; proc sort data=have ; by County_Code; run; proc freq data=have order=data; table County_Name*Yes_No/norow nocol nopercent; run;
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.