Hello all,
I have a dataset with 1000 observations and 6 variables (below is an example). The values of the variables bev_1-bev_5 are open ended text and contain different spelling variations. I'm creating additional variables which classify the responses in each bev_1-bev_5 column as 1=drink or 2=food and the type of drink. My goal is to set up a lookup table or essentially another dataset which includes all of the values of bev_1-bev_5 and the corresponding codes for my new variables. This lookup table will also have to include the spelling variations (e.g., Coke, Coca-cola, CocaCola, coke, etc. should all be coded as 1 for my product_Type variable, and should all be coded the same for any other variables I want to add).
Example dataset:
ID | bev_1 | bev_2 | bev_3 | bev_4 | bev_5 |
1 | Coke | Starbucks | Pepsi | CocaCola | fries |
2 | Coca-Cola | burger | pepsi | coke | redbull |
Question: I'm wondering what the best way would be to structure the lookup table so that for example, anytime SAS sees the word "Coke" or any of its variations (which I would include in my lookup table) in bev_1-bev_5, it will code it appropriately based on the new variables I'd like to add.
Lookup table that I want to link, essentially as a "dictionary" to my original dataset. Columns are the new variables I'll be adding:
Response and variations | product_type | type of drink |
Coke, Coca-cola, Cocacola | 1 | 1 |
burger | 2 | 0 |
redbull, Red Bull, RedBull | 1 | 3 |
As a pair for FORMATS (or INFORMATS) and not as a table at all.
Formats convert values to text. Informats convert text to values. If you want new PRODUCT_TYPE and TYPE_OF_DRINK values to be numbers then define INFORMATs and use the INPUT() function to convert the strings in BEV_1 to BEV_5 into numbers. But if you are ok with the results being strings you can just apply the format to BEV_1 to BEV_5 without having to modify the original table at all.
proc format ;
value $product
'Coke', 'Coca-cola', 'Cocacola'='1'
'burger'='2'
'redbull', 'Red Bull', 'RedBull'='1'
other = 'Unknown'
;
value $drink
'Coke', 'Coca-cola', 'Cocacola'='1'
'burger'='0'
'redbull', 'Red Bull', 'RedBull'='3'
other='Unknown'
;
run;
data have;
input ID (bev_1-bev_5) (:$10.) ;
cards;
1 Coke Starbucks Pepsi CocaCola fries
2 Coca-Cola burger pepsi coke redbull
;
proc print data=have;
format bev: $product.;
run;
Results:
Obs ID bev_1 bev_2 bev_3 bev_4 bev_5 1 1 1 Unknown Unknown Unknown Unknown 2 2 Unknown 2 Unknown Unknown 1
Thanks for your help Tom! As a follow up question, the reason why I was thinking of doing a table of some sort was because my dataset values are open text, with the different spelling variations I have thousands of unique responses. So i was trying to avoid having to manually type out each spelling variation as part of a SAS code but rather create a csv table which I could import into SAS and then link that csv to the following code that I was using to actually create the new variables I want.
data want;
set have;
array x{*} bev_1-bev_5;
array product_type {5};
if x{i} in ('Coke', 'Pepsi') then product_type{i}=1;
else if x{i}='burger' then product_type{i}=2;
end;
drop i;
run;
Thanks Reeza for your input. I've followed all of the steps as you described but I seem to still be stuck on linking the dataset I import from excel with my original dataset after using the cntlin function. This is the SAS code I've used:
proc transpose data=test out=bev_sample;
by id;
var bev_1-bev_5;
run;
*change all values to uppercase;
data bev_sample;
set bev_sample;
beverage=upcase(col1);
run;
*removing extra characters in values of COL1;
data bev_control;
set bev_sample;
beverage=compress(beverage);
run;
proc sort data=bev_control nodupkey;
by id;
run;
*setting fmtname, start, and label variables from bev_format dataset;
data bev_frmt;
rename id=start;
set bev_new;
label=map_code;
fmtname='$beverages';
run;
proc format cntlin=bev_frmt;
run;
*perform look up by reading in original dataset using the format I've created;
data bev_final;
set bev_control (where=(put(ID,$beverages.)='COKE'));
run;
Also just to clarify, this is an example of how I organized the excel file I read in as the format dataset:
ID | _NAME_ | COL1 | beverage | map_code | Response | product_type | drink_type |
03 | bev_1 | Coca cola | COCACOLA | COKE | 1 | 1 | 1 |
04 | bev_1 | Coke | COKE | COKE | 1 | 1 | 1 |
05 | bev_2 | Hamburger | HAMBURGER | HAMBURGER | 1 | 2 | 77 |
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.