Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Frequency tabulations for multiple-choice variables (20 possible optio...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-12-2020 07:07 PM
(1189 views)

Hi, I am a SAS beginner. I have some data that I need to tabulate: frequency, N and percent tabulations, by another variable (location). The data is from a multiple-choice survey question - there are 20 different multiple choice options, and respondents can choose up to 3 options. I'd like to be able to tabulate the frequency of each option for Location A, Location B, Location C, etc. and calculate the percentage, with the N being the number of people who responded to that question. So I wouldn't want to tabulate rows where none of the option choices have been selected. --> (Can I accomplish this by creating a new variable that concatenates variables of all 20 variables, and then exclude rows with 20 0s?)

Location_1 Popn_1 Popn_2 Popn_3 Popn_4.....

CityA 1 0 0 0

CityB 0 0 0 0

CityC 0 0 0 0

CityB 0 0 0 0

CityB 0 1 0 0

CityC 0 0 0 0

CityD 0 1 0 0

CityE 0 0 0 0

CityF 0 0 0 0

CityE 1 0 0 0

CityD 1 0 0 0

CityC 1 0 0 0

........

I was trying to understand this https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473759.htm, and also https://communities.sas.com/t5/SAS-Programming/Analyzing-Multiple-Response-Variables/td-p/485261

I ran into the same problem as above, was trying this test code:

proc tabulate data=cath.testdata2;

class location_1;

var popn_1-popn_3;

table popn_1='Everyone'

popn_2='Child'

popn_3='Youth',

(n='Count' pctn='Percent');

title 'Frequency and Counts for Population Choices';

run;

I know that my data has: 141 records where no option is selected; 496 records in total.

Output was:

Frequency and Counts for Population Choices

Count PercentEveryoneChildrenYouth

417 | 100.00 |

417 | 100.00 |

417 | 100.00 |

Additionally I would also like to tabulate, for each location, frequency by each variable against the other. (so the number who have selected popn_1 AND popn_2, etc.) (but after solving the above)

I'd like something that looks like this:

Count | Percent | N=355 | ||

1: Checked | 1: Checked | |||

CityA | Everyone | 5 | 2% | |

Children | 7 | |||

Youth | ||||

City B | Everyone | 15 | 4.23% | |

Children | … | … | ||

Youth | … | … |

Thanks any help provided

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have 1/0 coded values then using the variable as a VAR type in tabulate then SUM is the number of 1 values and Mean would be the percent.

So maybe something like:

proc tabulate data=cath.testdata2; class location_1; var popn_1-popn_3; table popn_1='Everyone' popn_2='Child' popn_3='Youth', (Sum='Count'*f=best5. mean='Percent'*f=percent8.1); table Location_1 * (popn_1='Everyone' popn_2='Child' popn_3='Youth'), (Sum='Count'*f=best5. mean='Percent'*f=percent8.1); title 'Frequency and Counts for Population Choices'; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you. I do not understand the numbers the output gives me.

I have my code like this:

proc tabulate data=cath.binarydata;

class location sectorcollapsed;

var popn1-popn20;

table (location='Location'),

(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')

/box="Populations Served by Location";

I am getting confused about what the numbers mean:

**When SAS calculates 'mean', what is the N it is calculating this out of? It uses the same N, it seems for each variable (as you can see below).** Where does it get this N value from? I just want to make sure this is correctly giving me the percentage of responses who chose that option out of the total who responded to the question. Is this correct?

Here is a sample of my data:

popn1 | popn2 | popn3 | popn4 |

1 | 0 | 0 | 0 |

0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 |

0 | 1 | 0 | 0 |

0 | 0 | 0 | 0 |

1 | 0 | 0 | 0 |

0 | 1 | 0 | 0 |

0 | 0 | 0 | 0 |

And here is a sample of my output:

Populations Served by Location | popn1 | popn2 | ||||

Location | Count | Percent | N (Total responses per question) | Count | Percent | N(Total responses received per question) |

Location | 11 | 84.6% | 13 | 2 | 15.4% | 13 |

Location1 | ||||||

Location2 | 11 | 42.3% | 26 | 8 | 30.8% | 26 |

Location3 | 28 | 65.1% | 43 | 9 | 20.9% | 43 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @ballardw, I had a subsequent question - I tried the code you had provided me, but I'm not understanding where the N for the mean comes from. Does it divide by the total number of responses received? I thought the N would be all of the people who answered the question, but it doesn't quite add up like this. As an example, in one of the locations, there were 149 total people in that location, and 19 did not provide an answer at all - so the N here should be 130, but it is giving me a value of 134 in the output. The mean is calculated out of 134. I tried to provide a reply here but I think I did something incorrect, and the posts were deleted.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't remember you asking for an "N",rather "th the N being the number of people who responded to that question" where 1 indicated they responded. If that is not what the 1 in your data means, then please describe otherwise. The SUM is the number of people that answered 1.

Did the code I showed as an example do what was intended for the limited example of data you provided? If so, then there is something else going on with your actual data that you need to provide.

So, you need to provide some actual example data with the " did not provide an answer at all " and what the expected result would be. It would also help to show the actual code your ran. Best way to do this is to copy from the LOG the code and any notes related to the procedure and then paste that into a code box. That way there is not question as to what was actually executed.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

With Proc Tabulate you **must** be careful the values of your class variables. If any of the class variables are missing then the record is excluded.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Ballard.

Here is my code:

proc tabulate data=cath.binarydata; where location in ("location1", "location2", "location3", "location4", "location5", "location6"); class location sectorcollapsed; var popn1-popn20; table (location='Location'), (popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question') /box="Populations Served by Location";

Here is a sample of my data: attached in data.xlsx

And here is a sample output: sampleoutput.xlsx

In order to determine how many people didn't answer the question, I filtered for only the blanks on each option for each location in Excel. They didn't add up.

I didn't quite understand what value it provides when I use the N statement in the code. That number (as you can see in the sample output) remains the same for each location across the board, and is the number by which the sum is divided in order to get the mean.

The sums work perfectly, but I don't understand what the mean is out of for each class value.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't open XLSX files from unknown sources.

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Why do you include Sectorcollapsed as a class variable when it is not used?

The mean of a 0/1 coded value is the percentage of 1's provided.

Consider

0 0 0 1 1 0 0 0 0 0. That is 2 / 10 or 20%, which is the (SUM of 1s)/(number of values) or a mean. The mean would have a numeric value of 0.2 so we use a Percent format to get "nice" looking values such as 20%.

"Didn't answer the question" might be the NMISS statistic if that means there is no value at all, i.e. missing. That would indicate the number of missing values.

Of course when you "filter in Excel" the results don't match. You do not show any corresponding "filter" in the SAS code. You don't even clearly describe how you "filtered" in Excel. So I can't make a suggestion.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I use the class sectorcollapsed in another table (I only included the code to create the population table in the question). The code for that one is similar.

I've included the data and the output as an image. I only included the first 3 population variables here. As you can see, the N stays consistent across for each location. For location 6, 13 people responded to the question. For Location 5, 26 people responded to the question. (I checked this on Excel by taking the total from each location and filtering out the rows where the sum across the 20 variables was 0 (ie no selection of an option had been made). However, for Location 1 and 2 at the bottom: the values were off (ie the N using the filtering method on Excel for location 1 was 130 responses in total, not including those rows where the sum across 20 rows was 0; and for location 2, it was 39 - but the Ns for the SAS outputs are 134 and 44 respectively.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.