BookmarkSubscribeRSS Feed
Pre1
Calcite | Level 5

Hi, I am a SAS beginner. I have some data that I need to tabulate: frequency, N and percent tabulations, by another variable (location). The data is from a multiple-choice survey question - there are 20 different multiple choice options, and respondents can choose up to 3 options. I'd like to be able to tabulate the frequency of each option for Location A, Location B, Location C, etc. and calculate the percentage, with the N being the number of people who responded to that question. So I wouldn't want to tabulate rows where none of the option choices have been selected. --> (Can I accomplish this by creating a new variable that concatenates variables of all 20 variables, and then exclude rows with 20 0s?)

 

 

Location_1 Popn_1 Popn_2 Popn_3 Popn_4.....

CityA 1 0 0 0
CityB 0 0 0 0
CityC 0 0 0 0
CityB 0 0 0 0
CityB 0 1 0 0
CityC 0 0 0 0
CityD 0 1 0 0
CityE 0 0 0 0
CityF 0 0 0 0
CityE 1 0 0 0
CityD 1 0 0 0
CityC 1 0 0 0

........

I was trying to understand this https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473759.htm, and also https://communities.sas.com/t5/SAS-Programming/Analyzing-Multiple-Response-Variables/td-p/485261

 

I ran into the same problem as above, was trying this test code:

proc tabulate data=cath.testdata2;
class location_1;
var popn_1-popn_3;
table popn_1='Everyone'
popn_2='Child'
popn_3='Youth',
(n='Count' pctn='Percent');

title 'Frequency and Counts for Population Choices';
run;

 I know that my data has: 141 records where no option is selected; 496 records in total. 

 

Output was: 

Frequency and Counts for Population Choices

   Count PercentEveryoneChildrenYouth

417100.00
417100.00
417100.00

 

 

Additionally I would also like to tabulate, for each location, frequency by each variable against the other. (so the number who have selected popn_1 AND popn_2, etc.) (but after solving the above)

 

I'd like something that looks like this:

  CountPercentN=355
  1: Checked1: Checked 
CityAEveryone52% 
 Children7  
 Youth   
City BEveryone154.23% 
 Children 
 Youth 

 

Thanks any help provided

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

Please provide a full bur reduced set of test data (like a dozen rows) as a data step, and the expects outputs.

ballardw
Super User

If you have 1/0 coded values then using the variable as a VAR type in tabulate then SUM is the number of 1 values and Mean would be the percent.

So maybe something like:

 

proc tabulate data=cath.testdata2;
	class location_1;
	var popn_1-popn_3;
	table popn_1='Everyone'
	      popn_2='Child'
	      popn_3='Youth',
              (Sum='Count'*f=best5. mean='Percent'*f=percent8.1);
	table Location_1 * (popn_1='Everyone'
	      popn_2='Child'
	      popn_3='Youth'),
              (Sum='Count'*f=best5. mean='Percent'*f=percent8.1);
		  
	title 'Frequency and Counts for Population Choices';
run;
Pre1
Calcite | Level 5

Thank you. I do not understand the numbers the output gives me. 

 

I have my code like this:

    proc tabulate data=cath.binarydata;
    class location sectorcollapsed;
    var popn1-popn20;
    table (location='Location'),
    (popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses      received per question')

    /box="Populations Served by Location";

 

I am getting confused about what the numbers mean:

When SAS calculates 'mean', what is the N it is calculating this out of? It uses the same N, it seems for each variable (as you can see below). Where does it get this N value from? I just want to make sure this is correctly giving me the percentage of responses who chose that option out of the total who responded to the question. Is this correct?

Here is a sample of my data:

 

popn1popn2popn3popn4
1000
0000
0000
0000
0100
0000
1000
0100
0000

And here is a sample of my output:

Populations
Served by
Location
popn1  popn2  
LocationCountPercent

N (Total

responses per 

question)

CountPercent N(Total responses received per question)
Location1184.6%13215.4%13
Location1      
Location21142.3%26830.8%26
Location32865.1%43920.9%43

 

Pre1
Calcite | Level 5
Hi @ballardw, I had a subsequent question - I tried the code you had provided me, but I'm not understanding where the N for the mean comes from. Does it divide by the total number of responses received? I thought the N would be all of the people who answered the question, but it doesn't quite add up like this. As an example, in one of the locations, there were 149 total people in that location, and 19 did not provide an answer at all - so the N here should be 130, but it is giving me a value of 134 in the output. The mean is calculated out of 134. I tried to provide a reply here but I think I did something incorrect, and the posts were deleted.
ballardw
Super User

I don't remember you asking for an "N",rather "th the N being the number of people who responded to that question" where 1 indicated they responded. If that is not what the 1 in your data means, then please describe otherwise. The SUM is the number of people that answered 1.

 

Did the code I showed as an example do what was intended for the limited example of data you provided? If so, then there is something else going on with your actual data that you need to provide.

 

So, you need to provide some actual example data with the " did not provide an answer at all " and what the expected result would be. It would also help to show the actual code your ran. Best way to do this is to copy from the LOG the code and any notes related to the procedure and then paste that into a code box. That way there is not question as to what was actually executed.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

With Proc Tabulate you must be careful the values of your class variables. If any of the class variables are missing then the record is excluded.

 

 

Pre1
Calcite | Level 5

@ballardw 

 

Thanks Ballard.

 

Here is my code: 

proc tabulate data=cath.binarydata;
	where location in ("location1", "location2", "location3", "location4", "location5", "location6");
	class location sectorcollapsed;
	var popn1-popn20;
	table (location='Location'), 
				(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')
				/box="Populations Served by Location";

Here is a sample of my data: attached in data.xlsx

 

And here is a sample output: sampleoutput.xlsx

 

In order to determine how many people didn't answer the question, I filtered for only the blanks on each option for each location in Excel. They didn't add up.

 

I didn't quite understand what value it provides when I use the N statement in the code. That number (as you can see in the sample output) remains the same for each location across the board, and is the number by which the sum is divided in order to get the mean. 

 

The sums work perfectly, but I don't understand what the mean is out of for each class value. 

ballardw
Super User

I don't open XLSX files from unknown sources.

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Why do you include Sectorcollapsed as a class variable when it is not used?

 

The mean of a 0/1 coded value is the percentage of 1's provided.

Consider

0 0 0 1 1 0 0 0 0 0. That is 2 / 10 or 20%, which is the (SUM of 1s)/(number of values) or a mean. The mean would have a numeric value of 0.2 so we use a Percent format to get "nice" looking values such as 20%.

 

"Didn't answer the question" might be the NMISS statistic if that means there is no value at all, i.e. missing. That would indicate the number of missing values.

 

Of course when you "filter in Excel" the results don't match. You do not show any corresponding "filter" in the SAS code. You don't even clearly describe how you "filtered" in Excel. So I can't make a suggestion.

 

 

Pre1
Calcite | Level 5

I use the class sectorcollapsed in another table (I only included the code to create the population table in the question). The code for that one is similar.

 

I've included the data and the output as an image. I only included the first 3 population variables here. As you can see, the N stays consistent across for each location. For location 6, 13 people responded to the question. For Location 5, 26 people responded to the question. (I checked this on Excel by taking the total from each location and filtering out the rows where the sum across the 20 variables was 0 (ie no selection of an option had been made).   However, for Location 1 and 2 at the bottom: the values were off (ie the N using the filtering method on Excel for location 1 was 130 responses in total, not including those rows where the sum across 20 rows was 0; and for location 2, it was 39 - but the Ns for the SAS outputs are 134 and 44 respectively. 

 

Pre1_1-1606165700799.png

 

Pre1_0-1606165644859.png

 

Pre1
Calcite | Level 5

@ballardw I think it will work the way I expect it to if I delete the 0s in all cells for all the rows that sum up to 0 (ie nothing was selected for that question). *Edit - it worked when I did this 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2548 views
  • 0 likes
  • 3 in conversation