dear all,
how to merge two tables based on the first character of the 'company name' variables.
bu using the 2 below tables as an example,
table 1
Type | NAME | COMPANY NAME | COMPANY NAME_1 | COMPANY NAME - SHORT | PREVIOUS NAME |
917574 | ABBOT GROUP DEAD - 07/03/08 | ABBOT GROUP PLC | ABBOT GROUP PLC. | ABBOT GROUP PLC | UNIGROUP |
991144 | ABBOTT LABS.GBP (LON) DEAD - DEAD | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABOTT LABS.GBP (LON) |
671863 | ABBOTT LABS.USD (LON) DEAD - 21/10/16 | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABS. (XSQ) |
926020 | ABERCOM GROUP (LON) | ABERCOM GROUP LTD. | ABERCOM GROUP | ABERCOM GROUP LTD | |
911147 | ABERDEEN STEAK HOUSES DEAD - 03/01/03 | ABDN.STEAK HSES.GP.PLC. | |||
902737 | ABERDEEN TRUST | ABERDEEN ASSET MAN.PLC. | |||
507161 | ABDN.TST.WTS.A EXPIRY 30/09/99 | ABERDEEN ASSET MAN.PLC. | |||
961337 | ABERFORTH SMALLER COS.C | ABERFORTH SMCOS.IT.PLC. | |||
928988 | ABERFORTH SMCOS.C | ABERFORTH SMCOS.IT.PLC. | |||
960130 | ABERFORTH SPL.C SHARE DELISTED 27/07/94 | ABERFORTH | |||
15295C | ABERTIS (IRS) | ABERTIS INFRAESTRUCTURAS | ABERTIS INFSTS.SA | ABERTIS INFRAEST | ACESA INTRAESTRUCTU(IRS) |
29253E | ABERTIS (IRS) BONUS RIGHTS 03/08/04 | ABERTIS INFRAESTRUCTURAS | ABERTIS INFSTS.SA | ABERTIS INFRAEST | ACESA INTRAESTRUCTU(IRS) |
256273 | ABGENIX (IRS) DEAD - 12/11/07 | ABGENIX, INC. | ABGENIX INCO. | ABGENIX, INC. |
table 2
code | psn_name |
1 | ABBOTT CCCC |
2 | ABERDEEN ww |
3 | ACESA des |
4 | APPLE |
5 | ABERFORTH ftrr |
6 | ABDN dggg |
logic:
1splite each observation based on the 'blank' ',' '.' etc. in both tables.
2merge 'psn_name' in table 2 with each company name variable in table 1(i.e., NAME, COMPANY_NAME, COMPANY_NAME_1,COMPANY_NAME_SHORT,PREVIOUS_NAME) if 1)both of the characters is the first character of the 'company name' variable,
and 2) both of the characters are same.
if the psn_name can be matched with any of the 'company name' variables, it can be recorded with the entire row.
3 record the matched table, non_matched table 1 and non_matched table 2 separately.
so, for the
code | psn_name |
1 | ABBOTT CCCC |
the first character of psn_name variable is 'ABBOTT'.
so it can match with
991144 | ABBOTT LABS.GBP (LON) DEAD - DEAD | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABOTT LABS.GBP (LON) |
671863 | ABBOTT LABS.USD (LON) DEAD - 21/10/16 | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABORATORIES | ABBOTT LABS. (XSQ) |
as the first character of the variable is 'ABBOTT' as well
the second example,
6 | ABDN dggg |
so it can match with
911147 | ABERDEEN STEAK HOUSES DEAD - 03/01/03 | ABDN.STEAK HSES.GP.PLC. |
as the 'ABDN' is as same as the 'ABDN' in 'ABDN.STEAK HSES.GP.PLC.'
Could you please give me some suggestions about this? the sample has been added to the attachment.
thanks in advance.
It'll reduce the number of errors. I think I listed out the steps previously you need to do when doing a comparison like this:
1. Remove all exact matches
2. Do a fuzzy match - remove those
3. etc......
If you're at a University though you may qualify for a cheaper license or educational license for SAS Data Management Studio or Dataflux which can do the matching in a more robust manner. I didn't have great luck, but it depends on your data.
Dear Reeza,
thanks for your message.
is it the advice?
This is a non-trivial problem because you end up doing N*M calculations which can be quite big.
Some tips:
1. Upcase everything
2. Change all & or other symbols such as @ from text
3. Ensure Co and/or LTD are standardized, i.e. LTD vs LTD. vs Limited vs Limited
4. Do an exact match and remove those records
5. Remove all spaces and try a second round of exact matches
6. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE/SPEDIS and equality of first letter
7. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE
COMPGED works best in my experience
https://communities.sas.com/t5/General-SAS-Programming/Matching-Company-Names/td-p/439674
however, I confuse about several steps.
1.'Change all & or other symbols such as @ from text' how to change the symbols? should I remove all the symbols? or replace them with the same symbol?
2.'Ensure Co and/or LTD are standardized, i.e. LTD vs LTD. vs Limited vs Limited' how to standardize these words? Is there any code to do this step? particularly, will you remove the 'Co and/or LTD' words in a dataset, if few 'Co and/or LTD' words are recorded in the database?
@France wrote:
dear all,
how to merge two tables based on the first character of the 'company name' variables.
bu using the 2 below tables as an example,
as the first character of the variable is 'ABBOTT' as well
the second example,
6 ABDN dggg so it can match with
911147 ABERDEEN STEAK HOUSES DEAD - 03/01/03 ABDN.STEAK HSES.GP.PLC. as the 'ABDN' is as same as the 'ABDN' in 'ABDN.STEAK HSES.GP.PLC.'
Could you please give me some suggestions about this? the sample has been added to the attachment.
thanks in advance.
You mean first word not character.
SCAN function will identify the first word of a string.
Do you have UNIGROUP in table 2? If so what would the output look like?
Is the program supposed to work for "ABOTT LABS.GBP (LON)" when it appears the match you are requesting would not work because ABOTT does not match the obvious ABBOTT in table 2?
Is the a preferred order of comparison between the 5 different name variables?
Hi,
you could use the LIKE SQL Operator but if you're in a professional environment or what you do can have a financial impact, I would rather recommend you to select distinctively the company names from table1 and 2, match them by hand in a separate dataset and afterwards to link all 3 tables together. From my point of view it would be much faster, reliable and less challenging than trying to identify afterwards which matches were inaccurate and why.
- Cheers -
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.