BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zimcom
Pyrite | Level 9

Dear community,

 

I have a survey questionnaire, one of the questions is to ask people to pick their most favourite fruit(s) from the list below (select all that apply)

 

apple

pear

orange

grape

watermelon

banana

 

The data ended like below:

pid

fruit1

fruit2

fruit3

fruit4

fruit5

fruit6

1

apple

 

 

 

 

 

2

 

pear

 

 

 

 

3

 

 

orange

 

 

 

4

 

 

orange

 

 

 

5

 

 

 

grape

 

 

6

 

 

 

 

watermelon

 

7

 

pear

 

 

 

banana

8

apple

 

 

grape

 

 

9

 

 

orange

 

 

 

10

apple

 

 

 

 

banana

11

 

 

 

 

watermelon

 

12

 

 

orange

 

 

 

13

 

 

orange

 

 

 

14

apple

 

 

grape

 

 

15

 

 

 

 

watermelon

 

16

apple

 

 

 

 

banana

17

 

 

 

grape

 

 

18

apple

 

 

 

 

 

19

 

 

 

 

 

 

20

apple

 

 

 

 

 

 

Since this is one question in the dataset and I would think it should be presented as one variable (e.g., fruit), a few observations do have multiple selections though.

 

My question is how I should consolidate the data into one column (e.g., fruit)

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Personally I would be much more tempted to create normalized values such as "apple" with a 1/0 coding for selected/not selected.

A SUM of the Apple variable would indicate the number of respondents that selected apple, mean would the percentage of respondent that selected apples. You could also reduce that to a fake binary character value that would contain all of the information.

 

data want;
   set have;
   array fq fruit1-fruit6;
   array fv apple pear orange grape watermelon banana;
   /* set all the "standarized" variables to 0. 0 means
      not selected
   */
   do i= 1 to dim(fv);
      fv[i]=0;
   end;
   do i= 1 to dim(fq);
      index= whichc(fq[i],'apple', 'pear', 'orange', 'grape', 'watermelon', 'banana');
      if index>0 then fv[index]=1;
   end;
   fruitlist= catt(of fv(*));

run;

A 1 in the first position of fruitlist means that apple was selected, a 0 apple was not selected. A consistent form, has all of the data.

 

You could even create a custom format to indicate exactly which combination of selected values exists.

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

The answer depends on how you want to handle people who have two or more fruits chosen. Please tell us what you want to do in that situation.

--
Paige Miller
zimcom
Pyrite | Level 9

I want to capture both, like 'apple and pear' under the fruit variable, if both 'apple' and 'pear' have been selected.

or duplicate the row with one observation as 'apple' and one observation as 'pear', if there is more than one choice has been selected fro the same person. I also want to know if there is any other way to consolidate the data

 

 

Thank you! 

novinosrin
Tourmaline | Level 20

if it's survey data, I would keep it like

 

data want;

set have;

array t(*) fruit:;

selected_fruit=coalescec(of t(*));

run;

novinosrin
Tourmaline | Level 20

if it is more than one choice, I'm afraid you need to transpose. Either way, a long data set is always better than wide i think

zimcom
Pyrite | Level 9

@novinosrin

Thanks, this is great!

But I also want to learn how to keep both values 

 

Thanks 

gamotte
Rhodochrosite | Level 12

Hello,

 

data have;
    infile cards dlm=',' dsd;
	length fruit1-fruit6 $10;
    input id fruit1-fruit6 ;
cards;
1  , apple ,      ,        ,       ,            ,         
2  ,       , pear ,        ,       ,            ,         
3  ,       ,      , orange ,       ,            ,         
4  ,       ,      , orange ,       ,            ,         
5  ,       ,      ,        , grape ,            ,         
6  ,       ,      ,        ,       , watermelon ,         
7  ,       , pear ,        ,       ,            , banana  
8  , apple ,      ,        , grape ,            ,         
9  ,       ,      , orange ,       ,            ,         
10 , apple ,      ,        ,       ,            , banana  
11 ,       ,      ,        ,       , watermelon ,         
12 ,       ,      , orange ,       ,            ,         
13 ,       ,      , orange ,       ,            ,         
14 , apple ,      ,        , grape ,            ,         
15 ,       ,      ,        ,       , watermelon ,         
16 , apple ,      ,        ,       ,            , banana  
17 ,       ,      ,        , grape ,            ,         
18 , apple ,      ,        ,       ,            ,         
19 ,       ,      ,        ,       ,            ,         
20 , apple ,      ,        ,       ,            ,         
run;

data want;
    set have;

    keep id fruit order;

    array fruits(*) fruit:;


    order=1;
    do i=1 to dim(fruits);
        if not missing(fruits(i)) then do;
            fruit=fruits(i);
            output;
            order=order+1;
        end;
    end;
run;
novinosrin
Tourmaline | Level 20

Yes, I saw that note, multiple choices would mean multiple records per id by virtue of transpose

 

 

 

data fruit;
infile cards truncover;
input pid (fruit1 fruit2 fruit3 fruit4 fruit5 fruit6) ($);
cards;
1 apple orange
2 . pear
;

data want;
set fruit;
array t fruit:;
do _i_=whichc(coalescec(of t(*)),of t(*)) to dim(t);
if t(_i_)>' ' then do; seleted_fruit=t(_i_);output;end;
end;
keep pid seleted_fruit;
run;

novinosrin
Tourmaline | Level 20

or

 

proc transpose data=fruit out=f(where=(col1 is not missing));
by pid;
var fruit:;
run;
ballardw
Super User

Personally I would be much more tempted to create normalized values such as "apple" with a 1/0 coding for selected/not selected.

A SUM of the Apple variable would indicate the number of respondents that selected apple, mean would the percentage of respondent that selected apples. You could also reduce that to a fake binary character value that would contain all of the information.

 

data want;
   set have;
   array fq fruit1-fruit6;
   array fv apple pear orange grape watermelon banana;
   /* set all the "standarized" variables to 0. 0 means
      not selected
   */
   do i= 1 to dim(fv);
      fv[i]=0;
   end;
   do i= 1 to dim(fq);
      index= whichc(fq[i],'apple', 'pear', 'orange', 'grape', 'watermelon', 'banana');
      if index>0 then fv[index]=1;
   end;
   fruitlist= catt(of fv(*));

run;

A 1 in the first position of fruitlist means that apple was selected, a 0 apple was not selected. A consistent form, has all of the data.

 

You could even create a custom format to indicate exactly which combination of selected values exists.

zimcom
Pyrite | Level 9

@ballardw

@novinosrin

@gamotte

@PaigeMiller

 

Hi All, 

I can't appreciated more for the help and support I got from this forum. 

Your support really helped me to build up my confidence to continue the journey in learning SAS.

 

Thank you all !!! 

jdwaterman91
Obsidian | Level 7
Data Want;
Set Have;

FavFruit = Catx(' and ', of fruit1-fruit6);

Keep Id FavFruit;

Run;

 

Does this accomplish what you were trying to accomplish? 

zimcom
Pyrite | Level 9

the code works perfect!

 

Big THANKS to you!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2555 views
  • 6 likes
  • 6 in conversation