Hi, I am a SAS beginner. I have some data that I need to tabulate: frequency, N and percent tabulations, by another variable (location). The data is from a multiple-choice survey question - there are 20 different multiple choice options, and respondents can choose up to 3 options. I'd like to be able to tabulate the frequency of each option for Location A, Location B, Location C, etc. and calculate the percentage, with the N being the number of people who responded to that question. So I wouldn't want to tabulate rows where none of the option choices have been selected. --> (Can I accomplish this by creating a new variable that concatenates variables of all 20 variables, and then exclude rows with 20 0s?)
Location_1 Popn_1 Popn_2 Popn_3 Popn_4.....
CityA 1 0 0 0
CityB 0 0 0 0
CityC 0 0 0 0
CityB 0 0 0 0
CityB 0 1 0 0
CityC 0 0 0 0
CityD 0 1 0 0
CityE 0 0 0 0
CityF 0 0 0 0
CityE 1 0 0 0
CityD 1 0 0 0
CityC 1 0 0 0
........
I was trying to understand this https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473759.htm, and also https://communities.sas.com/t5/SAS-Programming/Analyzing-Multiple-Response-Variables/td-p/485261
I ran into the same problem as above, was trying this test code:
proc tabulate data=cath.testdata2;
class location_1;
var popn_1-popn_3;
table popn_1='Everyone'
popn_2='Child'
popn_3='Youth',
(n='Count' pctn='Percent');
title 'Frequency and Counts for Population Choices';
run;
I know that my data has: 141 records where no option is selected; 496 records in total.
Output was:
Frequency and Counts for Population Choices
Count PercentEveryoneChildrenYouth
417 | 100.00 |
417 | 100.00 |
417 | 100.00 |
Additionally I would also like to tabulate, for each location, frequency by each variable against the other. (so the number who have selected popn_1 AND popn_2, etc.) (but after solving the above)
I'd like something that looks like this:
Count | Percent | N=355 | ||
1: Checked | 1: Checked | |||
CityA | Everyone | 5 | 2% | |
Children | 7 | |||
Youth | ||||
City B | Everyone | 15 | 4.23% | |
Children | … | … | ||
Youth | … | … |
Thanks any help provided
Please provide a full bur reduced set of test data (like a dozen rows) as a data step, and the expects outputs.
If you have 1/0 coded values then using the variable as a VAR type in tabulate then SUM is the number of 1 values and Mean would be the percent.
So maybe something like:
proc tabulate data=cath.testdata2; class location_1; var popn_1-popn_3; table popn_1='Everyone' popn_2='Child' popn_3='Youth', (Sum='Count'*f=best5. mean='Percent'*f=percent8.1); table Location_1 * (popn_1='Everyone' popn_2='Child' popn_3='Youth'), (Sum='Count'*f=best5. mean='Percent'*f=percent8.1); title 'Frequency and Counts for Population Choices'; run;
Thank you. I do not understand the numbers the output gives me.
I have my code like this:
proc tabulate data=cath.binarydata;
class location sectorcollapsed;
var popn1-popn20;
table (location='Location'),
(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')
/box="Populations Served by Location";
I am getting confused about what the numbers mean:
When SAS calculates 'mean', what is the N it is calculating this out of? It uses the same N, it seems for each variable (as you can see below). Where does it get this N value from? I just want to make sure this is correctly giving me the percentage of responses who chose that option out of the total who responded to the question. Is this correct?
Here is a sample of my data:
popn1 | popn2 | popn3 | popn4 |
1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
And here is a sample of my output:
Populations Served by Location | popn1 | popn2 | ||||
Location | Count | Percent | N (Total responses per question) | Count | Percent | N(Total responses received per question) |
Location | 11 | 84.6% | 13 | 2 | 15.4% | 13 |
Location1 | ||||||
Location2 | 11 | 42.3% | 26 | 8 | 30.8% | 26 |
Location3 | 28 | 65.1% | 43 | 9 | 20.9% | 43 |
I don't remember you asking for an "N",rather "th the N being the number of people who responded to that question" where 1 indicated they responded. If that is not what the 1 in your data means, then please describe otherwise. The SUM is the number of people that answered 1.
Did the code I showed as an example do what was intended for the limited example of data you provided? If so, then there is something else going on with your actual data that you need to provide.
So, you need to provide some actual example data with the " did not provide an answer at all " and what the expected result would be. It would also help to show the actual code your ran. Best way to do this is to copy from the LOG the code and any notes related to the procedure and then paste that into a code box. That way there is not question as to what was actually executed.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
With Proc Tabulate you must be careful the values of your class variables. If any of the class variables are missing then the record is excluded.
Thanks Ballard.
Here is my code:
proc tabulate data=cath.binarydata; where location in ("location1", "location2", "location3", "location4", "location5", "location6"); class location sectorcollapsed; var popn1-popn20; table (location='Location'), (popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question') /box="Populations Served by Location";
Here is a sample of my data: attached in data.xlsx
And here is a sample output: sampleoutput.xlsx
In order to determine how many people didn't answer the question, I filtered for only the blanks on each option for each location in Excel. They didn't add up.
I didn't quite understand what value it provides when I use the N statement in the code. That number (as you can see in the sample output) remains the same for each location across the board, and is the number by which the sum is divided in order to get the mean.
The sums work perfectly, but I don't understand what the mean is out of for each class value.
I don't open XLSX files from unknown sources.
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Why do you include Sectorcollapsed as a class variable when it is not used?
The mean of a 0/1 coded value is the percentage of 1's provided.
Consider
0 0 0 1 1 0 0 0 0 0. That is 2 / 10 or 20%, which is the (SUM of 1s)/(number of values) or a mean. The mean would have a numeric value of 0.2 so we use a Percent format to get "nice" looking values such as 20%.
"Didn't answer the question" might be the NMISS statistic if that means there is no value at all, i.e. missing. That would indicate the number of missing values.
Of course when you "filter in Excel" the results don't match. You do not show any corresponding "filter" in the SAS code. You don't even clearly describe how you "filtered" in Excel. So I can't make a suggestion.
I use the class sectorcollapsed in another table (I only included the code to create the population table in the question). The code for that one is similar.
I've included the data and the output as an image. I only included the first 3 population variables here. As you can see, the N stays consistent across for each location. For location 6, 13 people responded to the question. For Location 5, 26 people responded to the question. (I checked this on Excel by taking the total from each location and filtering out the rows where the sum across the 20 variables was 0 (ie no selection of an option had been made). However, for Location 1 and 2 at the bottom: the values were off (ie the N using the filtering method on Excel for location 1 was 130 responses in total, not including those rows where the sum across 20 rows was 0; and for location 2, it was 39 - but the Ns for the SAS outputs are 134 and 44 respectively.
@ballardw I think it will work the way I expect it to if I delete the 0s in all cells for all the rows that sum up to 0 (ie nothing was selected for that question). *Edit - it worked when I did this
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.