BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Radwan
Quartz | Level 8

Hello everyone here 

please i have an issue which is 

i have data from questionnaire, but there are some questions as multi choices means the answer wil be over two values, so please how can i deal with such problem. 

especially when i need to know the frequency for the choices. 

thanks in advance.

Untitled.png

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't know if you utilise CDISC models, however that should instruct you on how to model questionnaire data.  Each row of data presents:

Questionnaire     Question   Response

 

In this way you can proc freq all question separately, apply multi level formats to them so they get categorised correctly etc.  This is the way you should be modelling your data anyways, however it is done that way for a reason!

 

For example, i have questions 1, 2, 3

My data looks like:
Question    Response

1                 1

2                 0

3                 1

 

To group 1 and 3 together I can use a multi-level format:

proc format;
  value tmp (multilabel)
    1="1"
    2="2"
    3="3"
    1,3="1,3";
run;

data have;
  question=1; result=1; output;
  question=2; result=0; output;
  question=3; result=1; output;
run;

proc means data=have;
  class question / mlf;
  var result;
  output out=want n=n;
  format question tmp.;
run;

 

 

 

 

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

Please supply example data in usable form (data step with datalines), so we have something to test code against.

And an example for the expected output form that example data.

Radwan
Quartz | Level 8

when i run proc means the variables with multi responses do not appear 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A common issue, occurs when people insist on keeping more than one data item in a single variable.  Expand your data out so that each observation contains one data element per data item, then run your procedures against it.  You can always alter for output later:

So: 

1,5,2,6,7 would become:

1

5

2

6

7

 

Radwan
Quartz | Level 8

YOU mean i have to separate the question into sub-questions that equal to the number of choices in the question? 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't know if you utilise CDISC models, however that should instruct you on how to model questionnaire data.  Each row of data presents:

Questionnaire     Question   Response

 

In this way you can proc freq all question separately, apply multi level formats to them so they get categorised correctly etc.  This is the way you should be modelling your data anyways, however it is done that way for a reason!

 

For example, i have questions 1, 2, 3

My data looks like:
Question    Response

1                 1

2                 0

3                 1

 

To group 1 and 3 together I can use a multi-level format:

proc format;
  value tmp (multilabel)
    1="1"
    2="2"
    3="3"
    1,3="1,3";
run;

data have;
  question=1; result=1; output;
  question=2; result=0; output;
  question=3; result=1; output;
run;

proc means data=have;
  class question / mlf;
  var result;
  output out=want n=n;
  format question tmp.;
run;

 

 

 

 

ballardw
Super User

@Radwan wrote:

Hello everyone here 

please i have an issue which is 

i have data from questionnaire, but there are some questions as multi choices means the answer wil be over two values, so please how can i deal with such problem. 

especially when i need to know the frequency for the choices. 

thanks in advance.

Untitled.png


My only somewhat tongue-in-cheek answer would be: Refer to the analysis plan that was drawn up before the data collection occurred.

 

Any data collection should support analysis. Which means that a plan should be in place as to how each component is to be used the analysis after collection. If you don't know how you are going to use a data item then I would ask: Why collect it?

 

Some things to consider with multiple choice answers are such things as:

Does the order of response make a difference: example is selecting response 5 before response 2, as in your first and third rows, to be treated at all differently then 2 before 5, as in the 4th row?

Are single responses such as the second row different than when a choice appears in combination with others?

Or specific combinations of particular interest occurring together?

Is the number of responses of importance? Possibly the main point of interest?

Or how many any specific responses occurred regardless of order?

 

Addition bits are how this variable is to be used against/with other variables? Is Q22 going to be used to "predict" variable Q23 (or other)

 

This is not exhaustive but gives some idea of the things you can get involved with. Many of these will require splitting out into additional variables such as Q22_1 = 1 when the value 1 was selected and 0 when not. Or creating another variable to hold the count of responses, of if specific combinations of interest occurred such as Q22_123 =1 when all (or possibly any) of responses 1, 2 and 3 are selected and 0 otherwise, or the number of 1, 2 and 3 when present and 0 otherwise.

 

What question(s) are the responses to Q22 supposed to answer? That is what the analysis plan should cover as well as specific measures or tests to be used.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1809 views
  • 3 likes
  • 4 in conversation