02-20-2016 05:43 PM
I have two data sets. One containes a list of formal fortuen 500 companies' name. The other contains patent information for a lot of companies which are extracted online, so the format of the companies' name may differ from the first data set, even if they refer to the same company.
What I wanted to do is to select out the patent information for the companies who are among the fortune 500 companies. So, I have to compare the compare the names between the two data sets. I used soundslike (=*) statemete, and treat each of the fortune 500 companies as a macro variable. As the follwoing code shows:
call symput(cats('company',suffix), company);
where organization_name =* "&company1.";
&company1. stands for the one of the fortune companies, but I have 500 macro variables like this.
one way I can think of is to run macro program, but I have to list 500 call statement for macro program!!!!!!
Is there any easy way to finish this task, instead of using calling macro program for 500 times?
02-20-2016 06:52 PM
I usually recommend the solution in this thread, primarily FriedEgg's. He also, suggests a open source tool called the Link King, if you can install software. That's usually not an option where I work.
Otherwise, you might want to consider a data step solution if that's an option. You can load the names into a temporary and loop through them, there should be code that I wrote for that solution somewhere on here.
Calling a macro 500 times is easy if you use call execute, but not efficient.
02-21-2016 12:43 PM
Yes, it's a straightforward SQL, but the next step is typically how do I do this 'fuzzy' comparison with different options which is what FriedEgg's solution covers, using SOUNDEX, COMPGED and other comparison operators.
Need further help from the community? Please ask a new question.