BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EricVanceMartin
Obsidian | Level 7

I have dataset with a many survey responses and many variables. One variable a string variable named Q3 in which the values look like this:

1,2

1,3,12

1,2,3,12,7,8,9

2,3,9

... (yes, the values are not entirely in order, though this is probably not a big deal)

 

I would like to parse this into a set of binary dummy variables, one for each possible value in Q_3, say Q3_1, Q3_2, Q3_3, ... Q3_12. For example, the values of Q3_9 for the data above would be:

0

0

1

1

 

I've played with some do loops using the SCAN function, but I have not been able to make it work. Can't find a forum question or help doc that addresses quite this situation. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The trimming is so that trailing spaces stored in the original variable do not cause false negatives. You could also just include space as one of the delimiters.

data want ;
  set have;
  array q3_ [12] ;
  do index=1 to dim(q3_);
    q3_[index]=0 ne indexw(q3,cats(index),' ,');
  end;
  drop index;
run;

The ARRAY statement will create 12 variables named Q3_1 to Q3_12.

The CATS() function will convert the numeric index into a string.

The INDEXW() or FINDW() function will locate that string in the list.

The comparison will check whether or not it was found.

 

View solution in original post

11 REPLIES 11
Astounding
PROC Star

The programming is easier than it seems.  However, it is possible that your data never includes some possible answers and it would be up to you to supply a complete list.  Are you prepared to do that?

EricVanceMartin
Obsidian | Level 7

Thanks for your reply. I'm not sure I understand what you mean. The term "value" here is a little ambiguous. It could mean the numbers between the commas, or it could mean any combination of these in a comma-delimited list.

 

In any case, the values are from from the back-end of a web-based survey instrument (which I did not design, or I would not be in this situation). I know what the possible values are. I have several survey items set up like this, each with a different number of possible responses. 

Astounding
PROC Star

Just to clarify, you would need to know the number of possible survey responses for each question.  

 

Can we assume that the survey question answers are integers that run from 1 to as many as needed?

 

There will be an array statement for each question, along the lines of:

 

array q3_ {12};

 

This statement would create Q3_1 through Q3_12.  So it is necessary to know for each question how many variables must be created.

EricVanceMartin
Obsidian | Level 7

Yes, I know the number and range of numbers that represent the possible responses for each item.

Tom
Super User Tom
Super User

Try the FINDW() function.

data want ;
  set have;
  array q3_ [12] ;
  do index=1 to dim(q3_);
    q3_[index]=0 ne findw(q3,cats(index),',','t');
  end;
  drop index;
run;
Obs   q3               q3_1   q3_2   q3_3   q3_4   q3_5   q3_6   q3_7   q3_8   q3_9   q3_10   q3_11   q3_12

 1    1,2                1      1      0      0      0      0      0      0      0      0       0       0
 2    1,3,12             1      0      1      0      0      0      0      0      0      0       0       1
 3    1,2,3,12,7,8,9     1      1      1      0      0      0      1      1      1      0       0       1
 4    2,3,9              0      1      1      0      0      0      0      0      1      0       0       0
 5                       0      0      0      0      0      0      0      0      0      0       0       0
EricVanceMartin
Obsidian | Level 7

This looks exactly like it! I will try with my real data tomorrow! Thanks!

EricVanceMartin
Obsidian | Level 7

This appears to have worked perfectly!

 

There's a whole lotta trimmin' goin' on in there, but if I read the code right, the main action is findw. Can you tell me if I understand this? 

 

Basically it creates a variable name and assigns a value based on a logical evaluation of the statement:

 

0 ne [position of index in the string]

 

...so if the index does not exist in the string, findw returns 0, 0 ne 0 evaluates false, and the value 0--for false--is assigned to the variable.

 

...but if the index does exist, findw returns a number greater than 0, 0 ne [not 0] evaluates true, and the value 1--for true--is assigned to the variable.

 

...? Thanks again!

 

 

Tom
Super User Tom
Super User

The trimming is so that trailing spaces stored in the original variable do not cause false negatives. You could also just include space as one of the delimiters.

data want ;
  set have;
  array q3_ [12] ;
  do index=1 to dim(q3_);
    q3_[index]=0 ne indexw(q3,cats(index),' ,');
  end;
  drop index;
run;

The ARRAY statement will create 12 variables named Q3_1 to Q3_12.

The CATS() function will convert the numeric index into a string.

The INDEXW() or FINDW() function will locate that string in the list.

The comparison will check whether or not it was found.

 

ballardw
Super User

Do you have access to the software that collected the data? Several of the survey data collection programs I have used have options for different layouts of export data sets. A common one is for multiple response questions to have the output as indicators such as

Q3_1 = 1 if the first response is chosen or Q3_2=0 if the second response choice is NOT chosen.

 

My first choice would be to go the collection software and see about such options.

A second question for you in your "dummy  variables" is do you need to capture the ORDER that responses were chosen? That may also be available as an option from the collection software.

 

By all the responses, as @Astounding requested, we would need to have a list for every single variable what the exact choices might be. Your example for Q3 shows values: 1, 2, 3, 7, 8, 9, 12. Do values 4,5,6,10,11 possibly exist? How about 13,14, 15?? and possibly special values like 99 for "refused to answer"? Every single possible code that could appear for Q3 is needed to code properly. And for every single other question that has such a list.

 

I have worked with survey data for over 25 years and from the example you show I would never assume that because you have 3 and 7 in the data that codes of 4,5, and 6 a valid responses. I have seen questions get reused over years with changing codes due to the responses of previous years and seen multiple response lists with as many as 60 such codes. So I would want every single value that is expected explicitly stated before recoding values. But also, as mentioned above, I would start with the software export options to save a lot of potential headaches.

 

 

EricVanceMartin
Obsidian | Level 7

Thanks for your thoughtful reply.

 

As I said above, I know all the possible response values for every item. If an unnecessary variable were to be generated, I can easily delete or ignore it. 

 

You're right that I could have played with the survey software (it's Qualtrics). I didn't think of that. I generally take any opportunity to learn new SAS skills. In any case, this instrument (which is reused periodically) is on the block for revisions soon, and a cleanup of these ridiculous multiple-response items is definitely part of that!

 

Tom's solution using findw works perfectly.

EricVanceMartin
Obsidian | Level 7

Also, I see that you are right that it would obviously be better to feed a discrete and correct list of the values to a script. In this case, because we are going to revise the instrument, I hope I never have to use this code again! Also, there is a check on spurious variables, in that the codebook--which I have to make myself--will not contain those values, and I will see this when I apply labels and such. Thanks again for your thoughts!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2601 views
  • 0 likes
  • 4 in conversation