Frequency tabulations for multiple-choice variables (20 possible optio...

Pre1 · Posted 11-12-2020 07:07 PM

Hi, I am a SAS beginner. I have some data that I need to tabulate: frequency, N and percent tabulations, by another variable (location). The data is from a multiple-choice survey question - there are 20 different multiple choice options, and respondents can choose up to 3 options. I'd like to be able to tabulate the frequency of each option for Location A, Location B, Location C, etc. and calculate the percentage, with the N being the number of people who responded to that question. So I wouldn't want to tabulate rows where none of the option choices have been selected. --> (Can I accomplish this by creating a new variable that concatenates variables of all 20 variables, and then exclude rows with 20 0s?)

Location_1 Popn_1 Popn_2 Popn_3 Popn_4.....

CityA 1 0 0 0
CityB 0 0 0 0
CityC 0 0 0 0
CityB 0 0 0 0
CityB 0 1 0 0
CityC 0 0 0 0
CityD 0 1 0 0
CityE 0 0 0 0
CityF 0 0 0 0
CityE 1 0 0 0
CityD 1 0 0 0
CityC 1 0 0 0

........

I was trying to understand this https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473759.htm, and also https://communities.sas.com/t5/SAS-Programming/Analyzing-Multiple-Response-Variables/td-p/485261

I ran into the same problem as above, was trying this test code:

proc tabulate data=cath.testdata2;
	class location_1;
	var popn_1-popn_3;
	table popn_1='Everyone'
	      popn_2='Child'
	      popn_3='Youth',
              (n='Count' pctn='Percent');
		  
	title 'Frequency and Counts for Population Choices';
run;

I know that my data has: 141 records where no option is selected; 496 records in total.

Output was:

Frequency and Counts for Population Choices

Count PercentEveryoneChildrenYouth

417	100.00
417	100.00
417	100.00

Additionally I would also like to tabulate, for each location, frequency by each variable against the other. (so the number who have selected popn_1 AND popn_2, etc.) (but after solving the above)

I'd like something that looks like this:

		Count	Percent	N=355
		1: Checked	1: Checked
CityA	Everyone	5	2%
	Children	7
	Youth
City B	Everyone	15	4.23%
	Children	…	…
	Youth	…	…

Thanks any help provided

ChrisNZ · Posted 11-13-2020 12:06 AM

Please provide a full bur reduced set of test data (like a dozen rows) as a data step, and the expects outputs.

High-Performance SAS Coding - Third Edition

ballardw · Posted 11-13-2020 04:28 AM

If you have 1/0 coded values then using the variable as a VAR type in tabulate then SUM is the number of 1 values and Mean would be the percent.

So maybe something like:

proc tabulate data=cath.testdata2;
	class location_1;
	var popn_1-popn_3;
	table popn_1='Everyone'
	      popn_2='Child'
	      popn_3='Youth',
              (Sum='Count'*f=best5. mean='Percent'*f=percent8.1);
	table Location_1 * (popn_1='Everyone'
	      popn_2='Child'
	      popn_3='Youth'),
              (Sum='Count'*f=best5. mean='Percent'*f=percent8.1);
		  
	title 'Frequency and Counts for Population Choices';
run;

Pre1 · Posted 11-23-2020 01:58 PM

Thank you. I do not understand the numbers the output gives me.

I have my code like this:

proc tabulate data=cath.binarydata;
class location sectorcollapsed;
var popn1-popn20;
table (location='Location'),
(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')

/box="Populations Served by Location";

I am getting confused about what the numbers mean:

When SAS calculates 'mean', what is the N it is calculating this out of? It uses the same N, it seems for each variable (as you can see below). Where does it get this N value from? I just want to make sure this is correctly giving me the percentage of responses who chose that option out of the total who responded to the question. Is this correct?

Here is a sample of my data:

popn1	popn2	popn3	popn4
1	0	0	0
0	0	0	0
0	0	0	0
0	0	0	0
0	1	0	0
0	0	0	0
1	0	0	0
0	1	0	0
0	0	0	0

And here is a sample of my output:

Populations Served by Location	popn1			popn2
Location	Count	Percent	N (Total responses per question)	Count	Percent	N(Total responses received per question)
Location	11	84.6%	13	2	15.4%	13
Location1
Location2	11	42.3%	26	8	30.8%	26
Location3	28	65.1%	43	9	20.9%	43

Pre1 · Posted 11-23-2020 03:12 PM

Hi @ballardw, I had a subsequent question - I tried the code you had provided me, but I'm not understanding where the N for the mean comes from. Does it divide by the total number of responses received? I thought the N would be all of the people who answered the question, but it doesn't quite add up like this. As an example, in one of the locations, there were 149 total people in that location, and 19 did not provide an answer at all - so the N here should be 130, but it is giving me a value of 134 in the output. The mean is calculated out of 134. I tried to provide a reply here but I think I did something incorrect, and the posts were deleted.

ballardw · Posted 11-23-2020 03:28 PM

I don't remember you asking for an "N",rather "th the N being the number of people who responded to that question" where 1 indicated they responded. If that is not what the 1 in your data means, then please describe otherwise. The SUM is the number of people that answered 1.

Did the code I showed as an example do what was intended for the limited example of data you provided? If so, then there is something else going on with your actual data that you need to provide.

So, you need to provide some actual example data with the " did not provide an answer at all " and what the expected result would be. It would also help to show the actual code your ran. Best way to do this is to copy from the LOG the code and any notes related to the procedure and then paste that into a code box. That way there is not question as to what was actually executed.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

With Proc Tabulate you must be careful the values of your class variables. If any of the class variables are missing then the record is excluded.

Pre1 · Posted 11-23-2020 03:44 PM

@ballardw

Thanks Ballard.

Here is my code:

proc tabulate data=cath.binarydata;
	where location in ("location1", "location2", "location3", "location4", "location5", "location6");
	class location sectorcollapsed;
	var popn1-popn20;
	table (location='Location'), 
				(popn1-popn20)*(Sum='Count'*f=best5. mean='Percent'*f=percent8.1 N='Total responses received per question')
				/box="Populations Served by Location";

Here is a sample of my data: attached in data.xlsx

And here is a sample output: sampleoutput.xlsx

In order to determine how many people didn't answer the question, I filtered for only the blanks on each option for each location in Excel. They didn't add up.

I didn't quite understand what value it provides when I use the N statement in the code. That number (as you can see in the sample output) remains the same for each location across the board, and is the number by which the sum is divided in order to get the mean.

The sums work perfectly, but I don't understand what the mean is out of for each class value.

ballardw · Posted 11-23-2020 03:59 PM

I don't open XLSX files from unknown sources.

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Why do you include Sectorcollapsed as a class variable when it is not used?

The mean of a 0/1 coded value is the percentage of 1's provided.

Consider

0 0 0 1 1 0 0 0 0 0. That is 2 / 10 or 20%, which is the (SUM of 1s)/(number of values) or a mean. The mean would have a numeric value of 0.2 so we use a Percent format to get "nice" looking values such as 20%.

"Didn't answer the question" might be the NMISS statistic if that means there is no value at all, i.e. missing. That would indicate the number of missing values.

Of course when you "filter in Excel" the results don't match. You do not show any corresponding "filter" in the SAS code. You don't even clearly describe how you "filtered" in Excel. So I can't make a suggestion.

Pre1 · Posted 11-23-2020 04:23 PM

I use the class sectorcollapsed in another table (I only included the code to create the population table in the question). The code for that one is similar.

I've included the data and the output as an image. I only included the first 3 population variables here. As you can see, the N stays consistent across for each location. For location 6, 13 people responded to the question. For Location 5, 26 people responded to the question. (I checked this on Excel by taking the total from each location and filtering out the rows where the sum across the 20 variables was 0 (ie no selection of an option had been made). However, for Location 1 and 2 at the bottom: the values were off (ie the N using the filtering method on Excel for location 1 was 130 responses in total, not including those rows where the sum across 20 rows was 0; and for location 2, it was 39 - but the Ns for the SAS outputs are 134 and 44 respectively.

Pre1 · Posted 11-23-2020 05:21 PM

@ballardw I think it will work the way I expect it to if I delete the 0s in all cells for all the rows that sum up to 0 (ie nothing was selected for that question). *Edit - it worked when I did this

Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Re: Frequency tabulations for multiple-choice variables (20 possible options, 3-5 max)

Registration is open

SAS Training: Just a Click Away