BookmarkSubscribeRSS Feed
mszommer
Obsidian | Level 7

Hello,

I would like to analyze a question, which is similar in nature to the one listed below:

With whom did you verify the need for the new practices you recommended? (Check the three most important)

A. other extension agents

B. farmers/producers

C. researchers

D. TV/radio

E. newspaper, pamphlets, bulletin

The above question is obtained from this article ---> https://www.joe.org/joe/2000june/tt2.php

Procedure from SAS are said to have been employed to analyze the data, however, the above link does not give details

 

How can I go about analyzing such a question, where I would like to study the combination of responses? I do not wish to consider each option as a variable (multiple dichotomies). In my case, the respondent needs to select 5 from 12 given options.

 

Any help would be more than appreciated.

 

Regards,

MS

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Some of us will not click on links to unknown web sites, as they can be a security threat.

 

Please explain exactly what analysis was performed on this data. If the analysis has a name, please provide that.

--
Paige Miller
mszommer
Obsidian | Level 7
Hello Paige,

The article did not mention any statistical procedure(s), except for the information that SAS procedures were used.

MS
PaigeMiller
Diamond | Level 26

But "analyzing" can mean a bazillion different things. What specific question will be answered by "analyzing" the data?

--
Paige Miller
SteveDenham
Jade | Level 19

I can't say for certain, but I would wager at least a small sum of money that PROC FREQ was used, and one-way tables were done for each question to tabulate the responses, and perhaps look for deviations from random selection.  Any more than that would depend on what the authors had to say about their results.

 

SteveDenham

mszommer
Obsidian | Level 7

Thank you for your efforts to understand the issue.

 

Like I mentioned earlier, the only information provided in the article is 'The statistical procedures from SAS(R) were used in all data analysis.'

 

I would like to analyze the combination-item frequencies. In the example stated above, three-item combinations would be ABC, ABD, ABE, ACE, BCD, BCE, etc., I plan to retain the combinations with relatively high frequencies as the 'three most important' influences. Would it be the right approach? What statistical method(s) can I use. Is there a procedure in SAS for it (multiple response)? Also, I have other categorical variables (age, gender, personas, etc.) and I would like to see if there is difference in their influence (the general top 3-important influences and then Age & their 3-Important influences, Gender & their 3-Important influences etc.).

 

Regards

MS

 

ballardw
Super User

@mszommer wrote:

Thank you for your efforts to understand the issue.

 

Like I mentioned earlier, the only information provided in the article is 'The statistical procedures from SAS(R) were used in all data analysis.'

 

I would like to analyze the combination-item frequencies. In the example stated above, three-item combinations would be ABC, ABD, ABE, ACE, BCD, BCE, etc., I plan to retain the combinations with relatively high frequencies as the 'three most important' influences. Would it be the right approach? What statistical method(s) can I use. Is there a procedure in SAS for it (multiple response)? Also, I have other categorical variables (age, gender, personas, etc.) and I would like to see if there is difference in their influence (the general top 3-important influences and then Age & their 3-Important influences, Gender & their 3-Important influences etc.).

 

Regards

MS

 


You get counts of combinations easily out of proc freq. Here is a small example with a data set you should have to test code with:

proc freq data=sashelp.class noprint;
 tables sex*age /list out=freqs;
run;

proc sort data=freqs;
   by descending count;
run;

proc print;
run;

So you could place all of your combination variables on the tables statement. You may need the option MISSING on the tables statement since you have not provided any example data.

Using proc freq with the OUT= option to create a sort-able data set you would need a separate table for each of your other variables added to the table statement with a separate out= data set.

 

The Proc Freq option order=freq won't quite get the order as you might want as it will apply the order to variables as they appear in the tables statement, so you would not get the descending order of the overall count.

 

ballardw
Super User

How many records do you have to work with?

How consistent were the respondents about choosing exactly 5 items?

How are your variables coded? (example data in the form of a data step is helpful here);

What specific questions are supposed to be answered by your analysis?

Is order of the values recorded part of the analysis (which was picked first, second etc?) or just presence of value/ selection.

mszommer
Obsidian | Level 7

I have about 15900 records to work with.
All the respondents have chosen 5 items (they couldn't have proceeded unless they chose 5 items)
Each item has been saved as a separate dichotomous variable (attached is a sample data)
I have mentioned the aim of the analysis in my comment above
There is no ranking - it doesn't matter which item was picked first

SteveDenham
Jade | Level 19

I know enough about this to blow up me and everyone nearby, but suppose I was told I had one hour to figure out a plan, one week to write code and one month to interpret results.

 

With 12 choices available, and requiring 5 answers, I see this is not a "choose all that apply" type of survey. That tells me there are 792 unique combinations of 5 answers to the 12 choices. That makes some kind of tabulation worthwhile, to see how many of those combinations actually show up in the data.  

 

I would probably use PROC FREQ for this (although SURVEYFREQ may be more appropriate at some stage).  The cross tabulation of question (1 to 12)  by chosen (Y or N)  would provide insight.  So armed with this, we address the issue of which 3 of the 12 are most important.  Now comes the question - "What constitutes most important?" I am going to assume it is the most common 3. That yields 220 possible triad combinations.  Again, inspection of the results will likely be your best bet (unless you have Enterprise Miner and want to set up some trees).

 

Then comes the hard part in my mind - some sort of multinomial approach, using the other categorical variables.  That sounds to me like a Cochran-Mantel-Haenszel analysis that tests for differences after accounting for the stratifying variables (your other categorical variables).  Again, PROC FREQ is likely to be the best tool. although PROC CATMOD may be useful. Categorical clustering may also be a possibility.

 

Look through the documentation for these procedures, especially paying attention to the examples.  One or more may be a close analogy to what you might want to do.

 

The approach that I would try to avoid is any kind of multinomial regression.  With so many levels of response, the explanatory variables are not likely to separate categories well.  .

 

SteveDenham

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You might be looking for "multiple-response categorical variable" analysis.

 

 This paper discusses the topic and you might find an example SAS program here . There is an R program and a vignette here , but it's not been updated for several years.

 

Alas, I know the methodology exists, and that's about all I know about it. Good luck!

 

mszommer
Obsidian | Level 7
Thanks, sld. I do not have access to the full paper that you recommended, but I reckon from the abstract that it discusses methods of analyzing 'choose all that apply' survey questions. I need to analyze choose 3 out of 8 and choose 5 out 12 items.

Thank you SteveDenham and ballardw for taking the time to help me out. I will check out your suggestions.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1333 views
  • 6 likes
  • 5 in conversation