I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:
Question on the survey is: Check all that apply... how was the HSS helpful?
Connecting me with resources
Helping me with feeding issues
Providing me with information about my child's development
If a person selects all answers my variable text looks like the following and is useless:
"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"
Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!!
@accintron wrote:
I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:
Question on the survey is: Check all that apply... how was the HSS helpful?
Connecting me with resources
Helping me with feeding issues
Providing me with information about my child's development
If a person selects all answers my variable text looks like the following and is useless:
"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"
Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!!
You will likely be better off creating a separate variable that is coded 1(present) or 0(not present) using the entire text of the responses as provided. An example plus something that does not occur in your example but uses likely keywords in a different sense.
data example; string= "Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"; ConnectResources= index(string,"Connecting me with resources")>0; HelpFeed = index(string,"Helping me with feeding issues")>0; ChildDevelop = index(string,"Providing me with information about my child's development")>0 ; FeedingResources= index(string,"Connecting me with feeding resources")>0; run;
SAS will return a numeric value of 1 for true and 0 for false. The index function returns information about where in the string your search text returns or 0 if not found. So comparing the index result > 0 returns 1.
I would also apply a label to the variables with the Text of the response value as that will make nicer reports.
I suggest using the 1/0 coding as then you can use statistics like Sum to get the number of True/Yes results or Mean to get the percent true.
There are some nice things that can be done on a per respondent basis like using sum/mean to get the respondents profile for a question(group), Range to tell if all the responses are the same (range=0), Any Yes (max = 1) , any No (min=0).
If the survey, for example, allows choices A B C or multiple choices such as BC or ABC.
Or you could have variable A to be a 1 or zero, variable B to be a 1 or zero, variable C to be a 1 or zero. Not knowing the full details of your situation and planned analyses, I would prefer this one (but other methods might work depending on situation)
But in order to provide code, we'd need to know how your data is arriving, and what you plan to do with it.
Thanks for your response Paige. When a respondent checks multiple responses, all the answers are currently output to the dataset into one column as running text like:
@accintron wrote:
Thanks for your response Paige. When a respondent checks multiple responses, all the answers are currently output to the dataset into one column as running text like:
"Connecting me with resources (e.g., housing assistance, diapers, employment, etc.) Helping me with feeding issues (e.g., breastfeeding support) Providing me with information about my childs development Helping me with concerns I have about my child Being available tome in between my childs doctor visits"I want to just be able to run frequencies for each response, counting the number of occurences, so I think dummy variables 0,1 would be best, but need to have SAS be able to read the text and create a variable from that.
Does the system insert anything between the values (perhaps there is TAB or CR or LF there you are not showing)? If so you could scan the list to break into the components. In that case you might be able to write a generic program that could convert any of these type of multi-response questions into multiple variables.
Otherwise you will need to know the possible values to be able to make the new variables. Some thing like:
data want;
set have ;
q1_resource = 0<indexw(q1,'Connecting me with resources (e.g., housing assistance, diapers, employment, etc.) ');
q1_feeding = 0<indexw(q1,'Helping me with feeding issues (e.g., breastfeeding support)');
....
run;
NOTE; If there is no delimiter then watch out to make sure some choices are not sub strings of other possible choices. That is why i used INDEXW() instead of INDEX() in the example above as it will reduce the risk of mismatching when you have choices like: FATHER, GRANDFATHER, ...
If you have a metadata source that lists the possible answers to a question then you should be able to write a program that generates the code you need from that metadata.
@accintron wrote:
I have survey data I'm trying to process and have the situation that a few of the questions ask a person filling out the form to "select all that apply". The result is a meaningless collection of text in one column (variable) such as the following:
Question on the survey is: Check all that apply... how was the HSS helpful?
Connecting me with resources
Helping me with feeding issues
Providing me with information about my child's development
If a person selects all answers my variable text looks like the following and is useless:
"Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"
Is there a way to search the variable text and create a new (dummy) variable from the identification of key words? Or is a better way to think about processing this and creating a new variable? Thanks in advance!!
You will likely be better off creating a separate variable that is coded 1(present) or 0(not present) using the entire text of the responses as provided. An example plus something that does not occur in your example but uses likely keywords in a different sense.
data example; string= "Connecting me with resources Helping me with feeding issues Providing me with information about my child's development"; ConnectResources= index(string,"Connecting me with resources")>0; HelpFeed = index(string,"Helping me with feeding issues")>0; ChildDevelop = index(string,"Providing me with information about my child's development")>0 ; FeedingResources= index(string,"Connecting me with feeding resources")>0; run;
SAS will return a numeric value of 1 for true and 0 for false. The index function returns information about where in the string your search text returns or 0 if not found. So comparing the index result > 0 returns 1.
I would also apply a label to the variables with the Text of the response value as that will make nicer reports.
I suggest using the 1/0 coding as then you can use statistics like Sum to get the number of True/Yes results or Mean to get the percent true.
There are some nice things that can be done on a per respondent basis like using sum/mean to get the respondents profile for a question(group), Range to tell if all the responses are the same (range=0), Any Yes (max = 1) , any No (min=0).
Creating the dummy variables is what I had in mind...
I've adopted your code but it does not seem to be capturing the responses and creating variables appropriately...
This is my actual code with all the possible responses a respondent can select:
data test;
set site3.parent;
ConnectResources = index(Q20_HSS_Helpful, "Connecting me with resources (e.g., housing assistance, diapers, employment, etc.)")>0;
HelpFeed = index(Q20_HSS_Helpful, "Helping me with feeding issues (e.g., breastfeeding support)")>0;
ChildDevelop = index(Q20_HSS_Helpful, "Providing me with information about my childs development")>0 ;
ChildCareInfo = index(Q20_HSS_Helpful, "Providing me with information about how to care for my child")>0;
HelpChildConcerns = index(Q20_HSS_Helpful, "Helping me with concerns I have about my child")>0;
HelpSelfFamilyConcerns = index(Q20_HSS_Helpful, "Helping me with concerns I have about myself or my family")>0;
AvailableBtwnVisits = index(Q20_HSS_Helpful, "Being available to me in between my childs doctor visits")>0;
run;
Here is the resulting columns in the table:
Hard to tell what is in that text variable from Polaroids of the data.
Try copy and paste. Try printing the value using $HEX format so you can tell if those are actual spaces (hexcode '20'x) between the words or something else.
Try using FINDW() function as it has options to do case insensitive search.
Ok, thank you for the feedback. I can't figure out the hexformat thing, it's giving me lots of problems for something that is probably so simple, but thanks for the suggestion anyway. I'm going to have to give this coding issue a break and come back to it later. Thanks for your suggestions.
I finally came back to this. The issue I was having was punctuation wasn't allowing responses to be captured appropriately so I ended up using only the portion of the response needed to uniquely identify it to create new variables.
For instance, instead of: ChildDevelop = index(string,"Providing me with information about my child's development")>0 ;
I did:
ChildDevelop = index(string,"Providing me with information about my")>0 ;
Thanks for your help!
Exporting the data to a CSV file, since appears that you may have had this as a spreadsheet, may help. Then you can copy and paste text from the actual values.
I note that you apparently have at least one value that is , likely, replacing a curly ' (not the simple programing single quote) to a " |" or similar. I suspect the text expected would be "child's" but because of any number of places the quote became a "smart quote" and so some text may not search as expected.
Note that in SAS you can use different quotes of single and double as needed such as "Some text with a single quote's" to control interpretation. And if that gets difficult because of mixed quotes in the text you can search for quotes by doubling them:
data example; x="Some text with ' in it"; y="Some text with ' and "" in it"; z='Some text with '' in the middle'; run;
And I do sympathize. I have had to parse out data collected for "fillable" PDF forms that provided should "check all data".
You know what answers are possible (as text) so you can just search for this text in your variable to determine if it has been given or not (using the find() function for example).
How you then best store the data depends on how you plan to analyse it. You could create a code for any possible combination and just store the code, or you create a variable array with a variable per possible answer and with values 0 and 1.
Something else to consider if you have access or are working with a good contractor is the actual data export.
Depending on the survey software used you may be able to ask for multiple response variables to be exported from the collection software as either dummy coded (some sort of true/false selected/not selected ) for each possible response or other one variable per possible response value.
I know that several Survey collection packages I used had different options for how the data was exported and often the default wasn't always the most useful.
For example one package with multiple response variables would default to the order selected, which is useful for some tasks (i.e, how many respondents picked choice A first) but a PITA to read as any column could have a specific value. So that often would entail reading the data and then creating extra variables for the "did this response get chosen" for each possible response to do frequencies as you need.
Thank you, that's a good point. I'll start by asking the ladies who use the survey software more frequently to see how they handle the select all answers questions and go from there. Thansk for all of your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.