SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
arunvaibhav2
Calcite | Level 5

Hello All,

 

Problem:

 

I have a Data Cleaning problem. I have this data where There is a Product manufacturer and Product Brand.

Each manufacturer has several Brands. The brands data is manually entered and is messy. I need to check the data which is manually entered and create standard brand name.

 

Here is the example,

 

Product Manufacturer

Product Brand

Abila

Taquila

Abila

TAQUILA

Abila

tAqueela

Tambola

Daqueela

Tambola

Taqueela

Abila

daqueela

Tambola

Samila

 

Here is the desired output,

Product Manufacturer

Product Brand

Standard Brand

Abila

Taquila

Taquila

Abila

TAQUILA

Taquila

Abila

tAqueela

Taquila

Tambola

Daqueela

Daqueela

Tambola

Taqueela

Taquila

Abila

daqueela

Daqueela

Tambola

SAMmila

Samila

 

The next step would be get the count of unique combinations. Which I can achieve once I got the above output.

The problem here is, I need to do this for only 2 Manufacturers but all brands there are (~3,000) of them.

 

Approach:

 

I did run a select distinct brand name and got around 200,000 names which are different combinations of same name. For example, the result retuned as

Abila

abila

abla

cabla

Cabl

Cabla

..

So, Since I cannot extract all possible combinations, I extracted count as well, which gave me most frequent used notation, I ran a program to delete all less frequent notations of brand name.

Now I have list of Product Brands ~90,000

 

But, I cannot code multiple WHEN … THEN Statements like below (I should code (90,000) of them, which is impossible)

 

CASE

     WHEN MISC.PRD_BRAND_NM Like 'ABila%' OR MISC.PRD_BRAND_NM Like  'abila%' OR MISC.PRD_BRAND_NM Like  'ab%' THEN 'Abila' AS Standard_name

I’m looking is there a way that I create a macro and pass all variables through a function to create standard name?

 

Some thing like below,

 

CASE

     WHEN MISC.PRD_BRAND_NM Like &pass. OR MISC.PRD_BRAND_NM Like  &pass. OR MISC.PRD_BRAND_NM Like  &pass. THEN &pass. AS Standard_name

Do <this function> until <end of all brands>

<this function>

To replace macro variables with all combinations.

 

Or any other way I can achieve the objective?

Thanks a ton!

2 REPLIES 2
Reeza
Super User

Do you have a master list of what should be the names?

 

 

SASKiwi
PROC Star

You have two choices. You can either labouriously code a custom, do-it-yourself solution to your problem or you could consider a tool that has this functionality built-in such as SAS Data Quality / Dataflux. The latter solution is definitely a better one if you are doing a lot of data cleansing. Does your organisation have any such tools?

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 906 views
  • 0 likes
  • 3 in conversation