I have a list of companies that I have to cross to another list of companies by name.
As you can imagine, the main problem with this task is that usually, the companies' names end with letters as LLC in USA or SA or SL in Spain.
The problem with this topic is that in some cases LLC can be written LLC or L.L.C., so that I need to delete the dots, and in this case, if I use TRANWRD or COMPRESS, after deleting the dot, the process is including an space, so I obtain L L C when I need to obtain LLC
Some examples of my code:
OPTION 1: tranwrd(tranwrd(compbl(upcase(NAME)),'.',''),',','') as NEW_NAME OPTION 2: compress(compbl(upcase(NAME)),',.') as NEW_NAME
Any ideas on how can I resolve this matter?
I am not able to reproduce this
data have;
name = 'Paige Miller Corp, L.L.C.';
run;
data want;
set have;
newname=compress(name,'.');
run;
The result does not have L L C with spaces.
The bigger problem of matching corporation names that may indeed have different punctuation or spacing or spelling is one that might be solved by something called fuzzy matching.
I am not able to reproduce this
data have;
name = 'Paige Miller Corp, L.L.C.';
run;
data want;
set have;
newname=compress(name,'.');
run;
The result does not have L L C with spaces.
The bigger problem of matching corporation names that may indeed have different punctuation or spacing or spelling is one that might be solved by something called fuzzy matching.
Hello PaigeMiller, today the code is working, but I don't understand why wasn't working properly yesterday.
The final code that I'm using is this one:
compress(compbl(upcase(NAME)),',.') as NEW_NAME
Thanks a lot four your suggestion about fuzzy matching, I find very interesting the post, and it will be in my mind for the future.
@PaigeMiller's suggestion will remove all dots. This might be exactly what you need. OR, removing all might be too much. You could also try:
tranwrd('L.L.C', 'LLC')
Then you need a separate statement for each word such as S.A. or S.L.
Also note, if the length of your new variable has not been defined, many character functions assign a length of $200. You could overcome that by assigning a proper length at the beginning of the DATA step, before applying any functions.
@xavimjo wrote:
I have a list of companies that I have to cross to another list of companies by name.
As you can imagine, the main problem with this task is that usually, the companies' names end with letters as LLC in USA or SA or SL in Spain.
The problem with this topic is that in some cases LLC can be written LLC or L.L.C., so that I need to delete the dots, and in this case, if I use TRANWRD or COMPRESS, after deleting the dot, the process is including an space, so I obtain L L C when I need to obtain LLC
Are you sure the company's name doesn't contain L. L. C. with spaces in it as well as dots?
Can you provide us with a small portion of the data that illustrates the problem?
If you want to REMOVE the periods then why did you replace them with a single space by calling TRANWRD()?
tranwrd(XXX,',','')
Note that just typing the quotes enclosing a string literal next to each other does not create an empty string. Instead it creates a string with one space. Get in the habit of typing it that way to begin with to avoid confusion. Also it will make it easier to tell that it is two single quote characters and not one double quote character.
tranwrd(XXX,',',' ')
To remove individual characters (bytes) use COMPRESS().
compress(XXX,',')
To transform a multi-byte string to nothing you have to use TRANSTRN() instead of TRANWRD(). To generate an empty string you have to use TRIMN().
transtrn(XXX,',',trimn(' '))
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.