Help using Base SAS procedures

deal with the variations in merge

Reply
N/A
Posts: 0

deal with the variations in merge

I think this is a very challenging problem I am facing and I have no idea how to deal with it
Suppose I have two csv files

A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue
Acura Inf,2000,yellow

B.csv
Toyota Inc. Camry, 2000km
Honda Corp Civic,1500km
HondaUSA Inf, 2000, 2300km

I want to generate C.csv
Toyota Camry,1998,blue ,2000km
Honda Civic,1999,blue,1500km
HondaUSA Inf,2000,yellow,2300km

The worst part of the task is that there needs to be error tolerance to deal with the variations in the company name
1.extra spaces
2.extra dots
3.phrases such as Inc, corp.
4.Create a list of manual translation tables(Acura translates to HondaUSA)

Is this mission impossible?
Contributor
Posts: 61

Re: deal with the variations in merge

Posted in reply to deleted_user
I Don't know how much it will help to you....

From A.csv file split the name in to two parts by using scan function like Toyota and carmy as name1 and name2.
from B.csv file also do same thing.

finally merge by either name1 or name2 ........

Thanks....
N/A
Posts: 0

Re: deal with the variations in merge

Posted in reply to venkatesh
It looks a little complicated but by no means mission impossible.

You obviously need to get a standardised variable in each dataset (having read them in from the csv files.

Have a look at the INDEXW function...

http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000212235.htm

This will search a string for a specific word.

You could create a list of words you are looking for e.g. Honda Acura Nissan

and use the indexw function to look for them in the variable. If you find the word then put that word into a variable. Do this on both datasets, then change any Accura to to Honda etc.

Remember the INDEXW function is looking for words, so it uses a delimiter. So it won't pick the Honda out of HondaUSA. It is looking for words, not character strings. You can use regular expressions for that if you are really keen.

When you have a standard variable in each data set you can merge them.
Super Contributor
Posts: 359

Re: deal with the variations in merge

Posted in reply to deleted_user
Just a random thought, but have you given the Sounds-Like (=*) opperator a try? Also, you can use Regular expressions to match oun similar text.
Ask a Question
Discussion stats
  • 3 replies
  • 110 views
  • 0 likes
  • 3 in conversation