Hi all,
I have 5 variables (bev_1-bev_5) that are open text responses. I am trying to create a new variable for each bev_1-bev_5 which classifies the responses in each column as either 1 or 2. The first new variable that I want to create, called product_type should have 2 levels: if "Coca-Cola" or "Pepsi" are found in bev_1-bev_5 then the the variable product should =1. If "Hamburger" is found in any observations of bev_1-bev_5 then product should =2.
An example of my dataset is below. Variables product_1-product_5 are the new variables I am trying to create. Product_1 classifies the responses under bev_1, product_2 classifies responses of bev_2, etc.
ID | bev_1 | bev_2 | bev_3 | bev_4 | bev_5 | product_1 | product_2 | product_3 | product_4 | product_5 |
1 | Coke | Coke | Pepsi | Hamburger | Hamburger | 1 | 1 | 1 | 2 | 2 |
2 | Coke | Hamburger | Pepsi | Hamburger | Hamburger | 1 | 2 | 1 | 2 | 2 |
3 | Pepsi | Pepsi | Pepsi | Hamburger | Coke | 1 | 1 | 1 | 2 | 1 |
Here is an example of the progress I've made so far with my code. This code runs but ouputs only one variable and identifys which column "Coke" or "Pepsi" is located. I'm also not sure if an array is the most appropriate function to use in this case. Any help is appreciated!
data want;
set have;
array x{*} bev_1-bev_5;
product_type=whichc("Coke","Pepsi",of x{*});
run;