Re: Using array to recode multiple variables into one variable

monsterpie · Posted 05-26-2021 05:55 PM

Hi all,

I have 5 variables (bev_1-bev_5) that are open text responses. I am trying to create a new variable for each bev_1-bev_5 which classifies the responses in each column as either 1 or 2. The first new variable that I want to create, called product_type should have 2 levels: if "Coca-Cola" or "Pepsi" are found in bev_1-bev_5 then the the variable product should =1. If "Hamburger" is found in any observations of bev_1-bev_5 then product should =2.

An example of my dataset is below. Variables product_1-product_5 are the new variables I am trying to create. Product_1 classifies the responses under bev_1, product_2 classifies responses of bev_2, etc.

ID	bev_1	bev_2	bev_3	bev_4	bev_5	product_1	product_2	product_3	product_4	product_5
1	Coke	Coke	Pepsi	Hamburger	Hamburger	1	1	1	2	2
2	Coke	Hamburger	Pepsi	Hamburger	Hamburger	1	2	1	2	2
3	Pepsi	Pepsi	Pepsi	Hamburger	Coke	1	1	1	2	1

Here is an example of the progress I've made so far with my code. This code runs but ouputs only one variable and identifys which column "Coke" or "Pepsi" is located. I'm also not sure if an array is the most appropriate function to use in this case. Any help is appreciated!

data want;
set have;
array x{*} bev_1-bev_5;
product_type=whichc("Coke","Pepsi",of x{*});
run;

mkeintz · Posted 05-26-2021 06:02 PM

data want;
  set have;
  array x{*} bev_1-bev_5;
  array product_ {5};
  do i=1 to 5;
    if x{i} in ('Coke','Pepsi') then product_{i}=1;
    else if x{i}='Hamburger' then product_{i}=2;
  end;
  drop i;
run;

ARRAY is the right tool to use, but you should also implement a do loop over the array elements:

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 05-26-2021 06:30 PM

Do you have a mapping table for these values? If it's just the two sets of conditions it's not hard but this won't scale well if you have more data and categories. If you had a table that maps the value, Coke = 1, Pepsi=1, Hamburger=2 in a data set then this could be done nicely with a format and array.

monsterpie · Posted 05-26-2021 07:20 PM

I'm really new to SAS so I'm not exactly sure what a mapping table is? I will need to scale this up because I have more categories and will be creating more variables using the code based on this post.

Reeza · Posted 05-27-2021 10:51 AM

A look up table, that has the original values and the values you want them to be. Basically do you have a table that looks like:

Code Product
1 Coke
1 Pepsi
2 Hamburger
....

Use a format, see this document on how Formats work:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/001-30.pdf
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/001-30.pdf

Using array to recode multiple variables into one variable