Hi all,
I have 5 variables (bev_1-bev_5) that are open text responses. I am trying to create a new variable for each bev_1-bev_5 which classifies the responses in each column as either 1 or 2. The first new variable that I want to create, called product_type should have 2 levels: if "Coca-Cola" or "Pepsi" are found in bev_1-bev_5 then the the variable product should =1. If "Hamburger" is found in any observations of bev_1-bev_5 then product should =2.
An example of my dataset is below. Variables product_1-product_5 are the new variables I am trying to create. Product_1 classifies the responses under bev_1, product_2 classifies responses of bev_2, etc.
| ID | bev_1 | bev_2 | bev_3 | bev_4 | bev_5 | product_1 | product_2 | product_3 | product_4 | product_5 |
| 1 | Coke | Coke | Pepsi | Hamburger | Hamburger | 1 | 1 | 1 | 2 | 2 |
| 2 | Coke | Hamburger | Pepsi | Hamburger | Hamburger | 1 | 2 | 1 | 2 | 2 |
| 3 | Pepsi | Pepsi | Pepsi | Hamburger | Coke | 1 | 1 | 1 | 2 | 1 |
Here is an example of the progress I've made so far with my code. This code runs but ouputs only one variable and identifys which column "Coke" or "Pepsi" is located. I'm also not sure if an array is the most appropriate function to use in this case. Any help is appreciated!
data want;
set have;
array x{*} bev_1-bev_5;
product_type=whichc("Coke","Pepsi",of x{*});
run;
data want;
set have;
array x{*} bev_1-bev_5;
array product_ {5};
do i=1 to 5;
if x{i} in ('Coke','Pepsi') then product_{i}=1;
else if x{i}='Hamburger' then product_{i}=2;
end;
drop i;
run;
ARRAY is the right tool to use, but you should also implement a do loop over the array elements:
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.