Calcite | Level 5

## Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Hi, I am a SAS beginner. I have some data that I need to tabulate: frequency, N and percent tabulations, by another variable (location). The data is from a multiple-choice survey question - there are 20 different multiple choice options, and respondents can choose up to 3 options. I'd like to be able to tabulate the frequency of each option for Location A, Location B, Location C, etc. and calculate the percentage, with the N being the number of people who responded to that question. So I wouldn't want to tabulate rows where none of the option choices have been selected. --> (Can I accomplish this by creating a new variable that concatenates variables of all 20 variables, and then exclude rows with 20 0s?)

Location_1 Popn_1 Popn_2 Popn_3 Popn_4.....

CityA 1 0 0 0
CityB 0 0 0 0
CityC 0 0 0 0
CityB 0 0 0 0
CityB 0 1 0 0
CityC 0 0 0 0
CityD 0 1 0 0
CityE 0 0 0 0
CityF 0 0 0 0
CityE 1 0 0 0
CityD 1 0 0 0
CityC 1 0 0 0

........

I ran into the same problem as above, was trying this test code:

`proc tabulate data=cath.testdata2;	class location_1;	var popn_1-popn_3;	table popn_1='Everyone'	      popn_2='Child'	      popn_3='Youth',              (n='Count' pctn='Percent');		  	title 'Frequency and Counts for Population Choices';run;`

I know that my data has: 141 records where no option is selected; 496 records in total.

Output was:

Frequency and Counts for Population Choices

Count PercentEveryoneChildrenYouth

 417 100 417 100 417 100

Additionally I would also like to tabulate, for each location, frequency by each variable against the other. (so the number who have selected popn_1 AND popn_2, etc.) (but after solving the above)

I'd like something that looks like this:

 Count Percent N=355 1: Checked 1: Checked CityA Everyone 5 2% Children 7 Youth City B Everyone 15 4.23% Children … … Youth … …

Thanks any help provided

9 REPLIES 9
Tourmaline | Level 20

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Please provide a full bur reduced set of test data (like a dozen rows) as a data step, and the expects outputs.

Super User

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

If you have 1/0 coded values then using the variable as a VAR type in tabulate then SUM is the number of 1 values and Mean would be the percent.

So maybe something like:

```proc tabulate data=cath.testdata2;
class location_1;
var popn_1-popn_3;
table popn_1='Everyone'
popn_2='Child'
popn_3='Youth',
(Sum='Count'*f=best5. mean='Percent'*f=percent8.1);
table Location_1 * (popn_1='Everyone'
popn_2='Child'
popn_3='Youth'),
(Sum='Count'*f=best5. mean='Percent'*f=percent8.1);

title 'Frequency and Counts for Population Choices';
run;```
Calcite | Level 5

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Thank you. I do not understand the numbers the output gives me.

I have my code like this:

proc tabulate data=cath.binarydata;
class location sectorcollapsed;
var popn1-popn20;
table (location='Location'),
(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses      received per question')

/box="Populations Served by Location";

I am getting confused about what the numbers mean:

When SAS calculates 'mean', what is the N it is calculating this out of? It uses the same N, it seems for each variable (as you can see below). Where does it get this N value from? I just want to make sure this is correctly giving me the percentage of responses who chose that option out of the total who responded to the question. Is this correct?

Here is a sample of my data:

 popn1 popn2 popn3 popn4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0

And here is a sample of my output:

 PopulationsServed byLocation popn1 popn2 Location Count Percent N (Totalresponses per question) Count Percent N(Total responses received per question) Location 11 84.6% 13 2 15.4% 13 Location1 Location2 11 42.3% 26 8 30.8% 26 Location3 28 65.1% 43 9 20.9% 43

Calcite | Level 5

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Hi @ballardw, I had a subsequent question - I tried the code you had provided me, but I'm not understanding where the N for the mean comes from. Does it divide by the total number of responses received? I thought the N would be all of the people who answered the question, but it doesn't quite add up like this. As an example, in one of the locations, there were 149 total people in that location, and 19 did not provide an answer at all - so the N here should be 130, but it is giving me a value of 134 in the output. The mean is calculated out of 134. I tried to provide a reply here but I think I did something incorrect, and the posts were deleted.
Super User

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

I don't remember you asking for an "N",rather "th the N being the number of people who responded to that question" where 1 indicated they responded. If that is not what the 1 in your data means, then please describe otherwise. The SUM is the number of people that answered 1.

Did the code I showed as an example do what was intended for the limited example of data you provided? If so, then there is something else going on with your actual data that you need to provide.

So, you need to provide some actual example data with the " did not provide an answer at all " and what the expected result would be. It would also help to show the actual code your ran. Best way to do this is to copy from the LOG the code and any notes related to the procedure and then paste that into a code box. That way there is not question as to what was actually executed.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

With Proc Tabulate you must be careful the values of your class variables. If any of the class variables are missing then the record is excluded.

Calcite | Level 5

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Thanks Ballard.

Here is my code:

```proc tabulate data=cath.binarydata;
where location in ("location1", "location2", "location3", "location4", "location5", "location6");
class location sectorcollapsed;
var popn1-popn20;
table (location='Location'),
(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')
/box="Populations Served by Location";```

Here is a sample of my data: attached in data.xlsx

And here is a sample output: sampleoutput.xlsx

In order to determine how many people didn't answer the question, I filtered for only the blanks on each option for each location in Excel. They didn't add up.

I didn't quite understand what value it provides when I use the N statement in the code. That number (as you can see in the sample output) remains the same for each location across the board, and is the number by which the sum is divided in order to get the mean.

The sums work perfectly, but I don't understand what the mean is out of for each class value.

Super User

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

I don't open XLSX files from unknown sources.

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Why do you include Sectorcollapsed as a class variable when it is not used?

The mean of a 0/1 coded value is the percentage of 1's provided.

Consider

0 0 0 1 1 0 0 0 0 0. That is 2 / 10 or 20%, which is the (SUM of 1s)/(number of values) or a mean. The mean would have a numeric value of 0.2 so we use a Percent format to get "nice" looking values such as 20%.

"Didn't answer the question" might be the NMISS statistic if that means there is no value at all, i.e. missing. That would indicate the number of missing values.

Of course when you "filter in Excel" the results don't match. You do not show any corresponding "filter" in the SAS code. You don't even clearly describe how you "filtered" in Excel. So I can't make a suggestion.

Calcite | Level 5

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

I use the class sectorcollapsed in another table (I only included the code to create the population table in the question). The code for that one is similar.

I've included the data and the output as an image. I only included the first 3 population variables here. As you can see, the N stays consistent across for each location. For location 6, 13 people responded to the question. For Location 5, 26 people responded to the question. (I checked this on Excel by taking the total from each location and filtering out the rows where the sum across the 20 variables was 0 (ie no selection of an option had been made).   However, for Location 1 and 2 at the bottom: the values were off (ie the N using the filtering method on Excel for location 1 was 130 responses in total, not including those rows where the sum across 20 rows was 0; and for location 2, it was 39 - but the Ns for the SAS outputs are 134 and 44 respectively.

Calcite | Level 5

## Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

@ballardw I think it will work the way I expect it to if I delete the 0s in all cells for all the rows that sum up to 0 (ie nothing was selected for that question). *Edit - it worked when I did this

Discussion stats
• 9 replies
• 1190 views
• 0 likes
• 3 in conversation