BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I think this is a very challenging problem I am facing and I have no idea how to deal with it
Suppose I have two csv files

A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue
Acura Inf,2000,yellow

B.csv
Toyota Inc. Camry, 2000km
Honda Corp Civic,1500km
HondaUSA Inf, 2000, 2300km

I want to generate C.csv
Toyota Camry,1998,blue ,2000km
Honda Civic,1999,blue,1500km
HondaUSA Inf,2000,yellow,2300km

The worst part of the task is that there needs to be error tolerance to deal with the variations in the company name
1.extra spaces
2.extra dots
3.phrases such as Inc, corp.
4.Create a list of manual translation tables(Acura translates to HondaUSA)

Is this mission impossible?
3 REPLIES 3
venkatesh
Calcite | Level 5
I Don't know how much it will help to you....

From A.csv file split the name in to two parts by using scan function like Toyota and carmy as name1 and name2.
from B.csv file also do same thing.

finally merge by either name1 or name2 ........

Thanks....
deleted_user
Not applicable
It looks a little complicated but by no means mission impossible.

You obviously need to get a standardised variable in each dataset (having read them in from the csv files.

Have a look at the INDEXW function...

http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000212235.htm

This will search a string for a specific word.

You could create a list of words you are looking for e.g. Honda Acura Nissan

and use the indexw function to look for them in the variable. If you find the word then put that word into a variable. Do this on both datasets, then change any Accura to to Honda etc.

Remember the INDEXW function is looking for words, so it uses a delimiter. So it won't pick the Honda out of HondaUSA. It is looking for words, not character strings. You can use regular expressions for that if you are really keen.

When you have a standard variable in each data set you can merge them.
Flip
Fluorite | Level 6
Just a random thought, but have you given the Sounds-Like (=*) opperator a try? Also, you can use Regular expressions to match oun similar text.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 696 views
  • 0 likes
  • 3 in conversation