BookmarkSubscribeRSS Feed
jag07g
Fluorite | Level 6

Hello,

 

I have a "check all that apply" variable where respondents can choose multiple responses to the question "What vegetables, if any, did you purchase this week at the grocery store?" with response categories consisting of "lettuce (coded 1)," "tomatoes (coded 2),"  etc. There are 12 different response categories and respondents can check any, all, or a combination of the 12 different categories. When I run a frequency on the raw variable, there are 144 different combinations of responses that were given. I have attached a screenshot of some of the response categories. For example, someone in the response category 102 selected both lettuce and tomatoes. 

 

vegpic.JPG

 

How can I analyze this variable to list the percentage of respondents who answered tomatoes, the percentage of respondents who answered lettuce, and so on relatively simply? My initial thought was to create 12 new variables like the code below, but this would be very tedious with the number of response categories. Is there a different way this could be analyzed? Thanks for any help!

lettuce=.;
if vegQ02 in (1,102, 103,....) then lettuce=1;
if vegQ02 in (2,203, 206....) then lettuce=2;

 

2 REPLIES 2
Kurt_Bremser
Super User

I do not see a variable for "response category".

Please do NOT supply data in pictures, post a data step with datalines, so we can easily create the dataset with copy/paste and submit. Help us to help you.

ballardw
Super User

The first thing that I would look is to go back to the software that collected the data and see about options on how the data is exported.

Several data collection packages I have used have options on exporting data such as this. You may find that there is an option to have that questions values exported as 14 dichotomous (0/1 coded where 1 indicates selected) variables, or 14 different variables that record the order responses were "checked".

 

From what I see if you have a value like

 

1211 you do not know if you have two responses of 12 and 11 or 3 of 1, 2 and 11. I base this statement on the shown value of 1110. Since you apparently do not have possible single values of 110 then I have to parse that value as 11 and 10. Which means that the responses are not in value order but selection order. So you do not know which value 1211 has.

You might have had other information in the raw file such that some values had leading 0, so 01, 02 instead of 1 2 which might make your shown value of 102 (which appears to be 10 and 2) originally a text value of 0102, which would allow parsing (having done so with a similar field with 45 categories).

So examine your source data and if the values has leading zeroes for the 01 type values then re-read the data as text so you have the leading 0. Then you can parse the value from left to right two characters at a time into single variables. You would also be able to parse "lettuce" as

 

lettuce = (index(vegQ02,'01')>0 and mod(index(vegQ02,'01'),2)=1);

which would have a 1 when the string '01' occurs in an "odd" starting spot, position 1,3, 5 etc. To avoid complications with values like 1012

data example;
   input vegq02 $;
   lettuce = (index(vegQ02,'01')>0 and mod(index(vegQ02,'01'),2)=1);
datalines;
01
0201
040201
1011
;

You really have to explain what your second line of IF code was supposed to do.

BTW it is a much better code scheme to use 1/0 for yes/no true/false etc. then 1/2. With a 1/0 scheme the Sum is the count of Yes/True/Present or what have you. The Mean is a percentage of Yes values.

If you look at multiple variables with the 1/0 coding scheme such as Sum(lettuce,tomato,carrot) then you get the number of response marked Yes. Range=0 can tell if they were all the same choice, max=1 would indicate at least one was chosen, min=0 at least one choice not made. With a 1/2 type coding then you pretty much have to test each value and accumulate test responses.

 

Another approach would be to parse the values with an array instead of a bunch of If/then/else:

data example;
   input vegq02 $;
   array r(14) (14*0);
   do i= 1 to (length(vegq02)/2) by 2;
      r[input(substr(vegq02,i,2),f2.)]=1;
   end;
   drop i;   
datalines;
01
0201
040201
1011
;

You could assign names like "lettuce" "tomatoe" "carrot" etc to the array elements if you want. Or just assign labels to r1 to r14.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2314 views
  • 0 likes
  • 3 in conversation