BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
accintron
Obsidian | Level 7

I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:

 

Question on the survey is: Check all that apply... how was the HSS helpful?

Connecting me with resources

Helping me with feeding issues

Providing me with information about my child's development 

 

If a person selects all answers my variable text looks like the following and is useless: 

"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"

 

Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!! 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@accintron wrote:

I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:

 

Question on the survey is: Check all that apply... how was the HSS helpful?

Connecting me with resources

Helping me with feeding issues

Providing me with information about my child's development 

 

If a person selects all answers my variable text looks like the following and is useless: 

"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"

 

Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!! 

 

 


You will likely be better off creating a separate variable that is coded 1(present) or 0(not present) using the entire text of the responses as provided. An example plus something that does not occur in your example but uses likely keywords in a different sense.

data example;
   string= "Connecting me with resources Helping me with feeding issues Providing me with information about my child's development";

   ConnectResources= index(string,"Connecting me with resources")>0;
   HelpFeed        = index(string,"Helping me with feeding issues")>0;
   ChildDevelop    = index(string,"Providing me with information about my child's development")>0 ;
   FeedingResources= index(string,"Connecting me with feeding resources")>0;
run;

SAS will return a numeric value of 1 for true and 0 for false. The index function returns information about where in the string your search text returns or 0 if not found. So comparing the index result > 0 returns 1.

I would also apply a label to the variables with the Text of the response value as that will make nicer reports.

 

I suggest using the 1/0 coding as then you can use statistics like Sum to get the number of True/Yes results or Mean to get the percent true.

There are some nice things that can be done on a per respondent basis like using sum/mean to get the respondents profile for a question(group), Range to tell if all the responses are the same (range=0), Any Yes (max = 1) , any No (min=0).

 

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

If the survey, for example, allows choices A B C or multiple choices such as BC or ABC.

 

Or you could have variable A to be a 1 or zero, variable B to be a 1 or zero, variable C to be a 1 or zero. Not knowing the full details of your situation and planned analyses, I would prefer this one (but other methods might work depending on situation)

 

But in order to provide code, we'd need to know how your data is arriving, and what you plan to do with it.

--
Paige Miller
accintron
Obsidian | Level 7

Thanks for your response Paige. When a respondent checks multiple responses, all the answers are currently output to the dataset into one column as running text like:

        

"Connecting me with resources (e.g., housing assistance, diapers, employment, etc.) Helping me with feeding issues (e.g., breastfeeding support) Providing me with information about my childs development Helping me with concerns I have about my child Being available to
me in between my childs doctor visits"
 
I want to just be able to run frequencies for each response, counting the number of occurences, so I think dummy variables 0,1 would be best, but need to have SAS be able to read the text and create a variable from that.
 
Tom
Super User Tom
Super User

@accintron wrote:

Thanks for your response Paige. When a respondent checks multiple responses, all the answers are currently output to the dataset into one column as running text like:

        

"Connecting me with resources (e.g., housing assistance, diapers, employment, etc.) Helping me with feeding issues (e.g., breastfeeding support) Providing me with information about my childs development Helping me with concerns I have about my child Being available to
me in between my childs doctor visits"
 
I want to just be able to run frequencies for each response, counting the number of occurences, so I think dummy variables 0,1 would be best, but need to have SAS be able to read the text and create a variable from that.
 

Does the system insert anything between the values (perhaps there is TAB or CR or LF there you are not showing)? If so you could scan the list to break into the components.  In that case you might be able to write a generic program that could convert any of these type of multi-response questions into multiple variables.

 

Otherwise you will need to know the possible values to be able to make the new variables. Some thing like:

data want;
  set have ;
  q1_resource = 0<indexw(q1,'Connecting me with resources (e.g., housing assistance, diapers, employment, etc.) ');
  q1_feeding = 0<indexw(q1,'Helping me with feeding issues (e.g., breastfeeding support)');
  ....
run;

NOTE; If there is no delimiter then watch out to make sure some choices are not sub strings of other possible choices.  That is why i used INDEXW() instead of INDEX() in the example above as it will reduce the risk of mismatching when you have choices like:  FATHER, GRANDFATHER, ...

 

If you have a metadata source that lists the possible answers to a question then you should be able to write a program that generates the code you need from that metadata.

ballardw
Super User

@accintron wrote:

I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:

 

Question on the survey is: Check all that apply... how was the HSS helpful?

Connecting me with resources

Helping me with feeding issues

Providing me with information about my child's development 

 

If a person selects all answers my variable text looks like the following and is useless: 

"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"

 

Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!! 

 

 


You will likely be better off creating a separate variable that is coded 1(present) or 0(not present) using the entire text of the responses as provided. An example plus something that does not occur in your example but uses likely keywords in a different sense.

data example;
   string= "Connecting me with resources Helping me with feeding issues Providing me with information about my child's development";

   ConnectResources= index(string,"Connecting me with resources")>0;
   HelpFeed        = index(string,"Helping me with feeding issues")>0;
   ChildDevelop    = index(string,"Providing me with information about my child's development")>0 ;
   FeedingResources= index(string,"Connecting me with feeding resources")>0;
run;

SAS will return a numeric value of 1 for true and 0 for false. The index function returns information about where in the string your search text returns or 0 if not found. So comparing the index result > 0 returns 1.

I would also apply a label to the variables with the Text of the response value as that will make nicer reports.

 

I suggest using the 1/0 coding as then you can use statistics like Sum to get the number of True/Yes results or Mean to get the percent true.

There are some nice things that can be done on a per respondent basis like using sum/mean to get the respondents profile for a question(group), Range to tell if all the responses are the same (range=0), Any Yes (max = 1) , any No (min=0).

 

accintron
Obsidian | Level 7

Creating the dummy variables is what I had in mind...

I've adopted your code but it does not seem to be capturing the responses and creating variables appropriately...

 

This is my actual code with all the possible responses a respondent can select:

 

data test;
set site3.parent;

 

ConnectResources = index(Q20_HSS_Helpful, "Connecting me with resources (e.g., housing assistance, diapers, employment, etc.)")>0;
HelpFeed = index(Q20_HSS_Helpful, "Helping me with feeding issues (e.g., breastfeeding support)")>0;
ChildDevelop = index(Q20_HSS_Helpful, "Providing me with information about my childs development")>0 ;
ChildCareInfo = index(Q20_HSS_Helpful, "Providing me with information about how to care for my child")>0;
HelpChildConcerns = index(Q20_HSS_Helpful, "Helping me with concerns I have about my child")>0;
HelpSelfFamilyConcerns = index(Q20_HSS_Helpful, "Helping me with concerns I have about myself or my family")>0;
AvailableBtwnVisits = index(Q20_HSS_Helpful, "Being available to me in between my childs doctor visits")>0;

 

run;

 

Here is the resulting columns in the table:

image.png

Tom
Super User Tom
Super User

Hard to tell what is in that text variable from Polaroids of the data.

Try copy and paste.  Try printing the value using $HEX format so you can tell if those are actual spaces (hexcode '20'x) between the words or something else.

Try using FINDW() function as it has options to do case insensitive search.

accintron
Obsidian | Level 7

Ok, thank you for the feedback. I can't figure out the hexformat thing, it's giving me lots of problems for something that is probably so simple, but thanks for the suggestion anyway. I'm going to have to give this coding issue a break and come back to it later. Thanks for your suggestions.

accintron
Obsidian | Level 7

I finally came back to this. The issue I was having was punctuation wasn't allowing responses to be captured appropriately so I ended up using only the portion of the response needed to uniquely identify it to create new variables.

 

For instance, instead of: 
ChildDevelop    = index(string,"Providing me with information about my child's development")>0 ;

I did:
ChildDevelop = index(string,"Providing me with information about my")>0 ;

 

Thanks for your help!

 

ballardw
Super User

Exporting the data to a CSV file, since appears that you may have had this as a spreadsheet, may help. Then you can copy and paste text from the actual values.

 

I note that you apparently have at least one value that is , likely, replacing a curly ' (not the simple programing single quote) to a " |" or similar. I suspect the text expected would be "child's" but because of any number of places the quote became a "smart quote" and so some text may not search as expected.

 

Note that in SAS you can use different quotes of single and double as needed such as "Some text with a single quote's" to control interpretation. And if that gets difficult because of mixed quotes in the text you can search for quotes by doubling them:

data example;
   x="Some text with ' in it";
   y="Some text with ' and "" in it";
   z='Some text with '' in the middle';
run;

And I do sympathize. I have had to parse out data collected for "fillable" PDF forms that provided should "check all data".

 

 

Patrick
Opal | Level 21

You know what answers are possible (as text) so you can just search for this text in your variable to determine if it has been given or not (using the find() function for example).

 

How you then best store the data depends on how you plan to analyse it. You could create a code for any possible combination and just store the code, or you create a variable array with a variable per possible answer and with values 0 and 1.

ballardw
Super User

Something else to consider if you have access or are working with a good contractor is the actual data export.

 

Depending on the survey software used you may be able to ask for multiple response variables to be exported from the collection software as either dummy coded (some sort of true/false selected/not selected ) for each possible response or other one variable per possible response value.

 

I know that several Survey collection packages I used had different options for how the data was exported and often the default wasn't always the most useful.

For example one package with multiple response variables would default to the order selected, which is useful for some tasks (i.e, how many respondents picked choice A first) but a PITA to read as any column could have a specific value. So that often would entail reading the data and then creating extra variables for the "did this response get chosen" for each possible response to do frequencies as you need.

 

 

accintron
Obsidian | Level 7

Thank you, that's a good point. I'll start by asking the ladies who use the survey software more frequently to see how they handle the select all answers questions and go from there. Thansk for all of your help!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1655 views
  • 0 likes
  • 5 in conversation