BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
msastry
Fluorite | Level 6

Is there a way to group nominal observations from a column to create a new variable?

 

Example: 

 

Month Customer_ID Merchant       Amount    Count

Apr-18 12345            Walmart         $20.00     5

Apr-18 23456            Exxon            $35.00     1

May-18 67891           Home Depot  $60.14     2

May-18 34567           Lowes            $1200     5

Jun-18  45678           Shell               $25.00    3

Jun-18  56789           McDonalds     $10.00    1

 

In the above example, if I would like to create a new column 'Category' that has all gas charges as Gas (Exxon, Shell), all home improvement charges as Home Improvement (Lowes, Home Depot), Restaurants (McDonalds) from the above example.

 

The issue is that there are ~5000 merchants in several months of data and is there an optimum way to group these into specific categories for further analysis/reporting/analytics?

 

Thank you.    

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
kiranv_
Rhodochrosite | Level 12

In my view, best way is to have a lookup table, something like below and then picking up category by doing join on Merchant

You can update or add to lookup on regular basis. This helps to maintain things easily.

 

Merchant                     Category

Exxon                           Gas

Shell                             Gas

Lowes                          Home Improvement  

Home Depot               Home Improvement     

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

Start with putting your relationships merchant-category into a dataset. Create a format from that dataset, and use it to create the category variable when you need it.

 

Edit: see some example code:

data lookup;
infile cards dlm=',';
input merchant :$20. category :$20.;
cards;
Walmart,Retail
Exxon,Gas
Home Depot,Home Improvement
Lowes,Home Improvement
Shell,Gas
McDonalds,Restaurants
;
run;

data cntlin;
set lookup (rename=(merchant=start category=label));
fmtname = 'category';
type = 'C';
run;

proc format cntlin=cntlin;
run;

data have;
infile cards dlm=',';
input month $ customer_id $ merchant :$20. amount count;
category = put(merchant,$category.);
cards;
Apr-18,12345,Walmart,20,5
;
run;
msastry
Fluorite | Level 6

Thank you, Mr. KurtBremser. I appreciate your quick response.

 

Just to clarify, did you mean to say I have to have a separate dataset that has 2 columns: 1. Category, and 2. Merchant? 

 

As the number of merchants are ~5000 in multiple categories, if there is a way to optimize adding the category column that has multiple merchants? Also, the list of merchants per category could be dynamic as the customers could go to new merchants to do their shopping, so the new merchants need to be added if they do not exist in the current dataset.

Kurt_Bremser
Super User

Setting up the lookup dataset is of course initial work, but once you have it, you have it, and it only needs maintenance. And it keeps data out of code.

And how does a customer changing merchants influence that merchant's category? That's highly illogical.

msastry
Fluorite | Level 6

Hi Mr. KurtBremser,

 

I finally used your idea on the lookup table. I appreciate your help.

 

Thank you,

 

 

kiranv_
Rhodochrosite | Level 12

In my view, best way is to have a lookup table, something like below and then picking up category by doing join on Merchant

You can update or add to lookup on regular basis. This helps to maintain things easily.

 

Merchant                     Category

Exxon                           Gas

Shell                             Gas

Lowes                          Home Improvement  

Home Depot               Home Improvement     

msastry
Fluorite | Level 6

Kiran V,

 

Thank you. I followed the lookup table advise to address the issue. I appreciate your help.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1362 views
  • 4 likes
  • 3 in conversation