BookmarkSubscribeRSS Feed
arunvaibhav2
Calcite | Level 5

Hello All,

 

Problem:

 

I have a Data Cleaning problem. I have this data where There is a Product manufacturer and Product Brand.

Each manufacturer has several Brands. The brands data is manually entered and is messy. I need to check the data which is manually entered and create standard brand name.

 

Here is the example,

 

Product Manufacturer

Product Brand

Abila

Taquila

Abila

TAQUILA

Abila

tAqueela

Tambola

Daqueela

Tambola

Taqueela

Abila

daqueela

Tambola

Samila

 

Here is the desired output,

Product Manufacturer

Product Brand

Standard Brand

Abila

Taquila

Taquila

Abila

TAQUILA

Taquila

Abila

tAqueela

Taquila

Tambola

Daqueela

Daqueela

Tambola

Taqueela

Taquila

Abila

daqueela

Daqueela

Tambola

SAMmila

Samila

 

The next step would be get the count of unique combinations. Which I can achieve once I got the above output.

The problem here is, I need to do this for only 2 Manufacturers but all brands there are (~3,000) of them.

 

Approach:

 

I did run a select distinct brand name and got around 200,000 names which are different combinations of same name. For example, the result retuned as

Abila

abila

abla

cabla

Cabl

Cabla

..

So, Since I cannot extract all possible combinations, I extracted count as well, which gave me most frequent used notation, I ran a program to delete all less frequent notations of brand name.

Now I have list of Product Brands ~90,000

 

But, I cannot code multiple WHEN … THEN Statements like below (I should code (90,000) of them, which is impossible)

 

CASE

     WHEN MISC.PRD_BRAND_NM Like 'ABila%' OR MISC.PRD_BRAND_NM Like  'abila%' OR MISC.PRD_BRAND_NM Like  'ab%' THEN 'Abila' AS Standard_name

I’m looking is there a way that I create a macro and pass all variables through a function to create standard name?

 

Some thing like below,

 

CASE

     WHEN MISC.PRD_BRAND_NM Like &pass. OR MISC.PRD_BRAND_NM Like  &pass. OR MISC.PRD_BRAND_NM Like  &pass. THEN &pass. AS Standard_name

Do <this function> until <end of all brands>

<this function>

To replace macro variables with all combinations.

 

Or any other way I can achieve the objective?

Thanks a ton!

2 REPLIES 2
Reeza
Super User

Do you have a master list of what should be the names?

 

 

SASKiwi
PROC Star

You have two choices. You can either labouriously code a custom, do-it-yourself solution to your problem or you could consider a tool that has this functionality built-in such as SAS Data Quality / Dataflux. The latter solution is definitely a better one if you are doing a lot of data cleansing. Does your organisation have any such tools?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 589 views
  • 0 likes
  • 3 in conversation