BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
xavimjo
Calcite | Level 5

I have a list of companies that I have to cross to another list of companies by name.

 

As you can imagine, the main problem with this task is that usually, the companies' names end with letters as LLC in USA or SA or SL in Spain.

 

The problem with this topic is that in some cases LLC can be written LLC or L.L.C., so that I need to delete the dots, and in this case, if I use TRANWRD or COMPRESS, after deleting the dot, the process is including an space, so I obtain L L C when I need to obtain LLC

 

Some examples of my code:

OPTION 1:
tranwrd(tranwrd(compbl(upcase(NAME)),'.',''),',','') as NEW_NAME

OPTION 2:
compress(compbl(upcase(NAME)),',.') as NEW_NAME

Any ideas on how can I resolve this matter?

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I am not able to reproduce this

 

data have;
    name = 'Paige Miller Corp, L.L.C.';
run;
data want;
    set have;
    newname=compress(name,'.');
run;

 

The result does not have L L C with spaces.

 

The bigger problem of matching corporation names that may indeed have different punctuation or spacing or spelling is one that might be solved by something called fuzzy matching.

--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I am not able to reproduce this

 

data have;
    name = 'Paige Miller Corp, L.L.C.';
run;
data want;
    set have;
    newname=compress(name,'.');
run;

 

The result does not have L L C with spaces.

 

The bigger problem of matching corporation names that may indeed have different punctuation or spacing or spelling is one that might be solved by something called fuzzy matching.

--
Paige Miller
xavimjo
Calcite | Level 5

Hello PaigeMiller, today the code is working, but I don't understand why wasn't working properly yesterday.

 

The final code that I'm using is this one:

compress(compbl(upcase(NAME)),',.') as NEW_NAME

Thanks a lot four your suggestion about fuzzy matching, I find very interesting the post, and it will be in my mind for the future.

Astounding
PROC Star

@PaigeMiller's suggestion will remove all dots.  This might be exactly what you need.  OR, removing all might be too much.  You could also try:

 

tranwrd('L.L.C', 'LLC')

 

Then you need a separate statement for each word such as S.A. or S.L.

 

Also note, if the length of your new variable has not been defined, many character functions assign a length of $200.  You could overcome that by assigning a proper length at the beginning of the DATA step, before applying any functions.

PaigeMiller
Diamond | Level 26

@xavimjo wrote:

I have a list of companies that I have to cross to another list of companies by name.

 

As you can imagine, the main problem with this task is that usually, the companies' names end with letters as LLC in USA or SA or SL in Spain.

 

The problem with this topic is that in some cases LLC can be written LLC or L.L.C., so that I need to delete the dots, and in this case, if I use TRANWRD or COMPRESS, after deleting the dot, the process is including an space, so I obtain L L C when I need to obtain LLC


Are you sure the company's name doesn't contain L. L. C. with spaces in it as well as dots?

 

Can you provide us with a small portion of the data that illustrates the problem?

--
Paige Miller
Tom
Super User Tom
Super User

If you want to REMOVE the periods then why did you replace them with a single space by calling TRANWRD()?

tranwrd(XXX,',','')

Note that just typing the quotes enclosing a string literal next to each other does not create an empty string. Instead it creates a string with one space.  Get in the habit of typing it that way to begin with to avoid confusion.  Also it will make it easier to tell that it is two single quote characters and not one double quote character.

tranwrd(XXX,',',' ')

To remove individual characters (bytes) use COMPRESS().

compress(XXX,',')

To transform a multi-byte string to nothing you have to use TRANSTRN() instead of TRANWRD().  To generate an empty string you have to use TRIMN().

transtrn(XXX,',',trimn(' '))

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1317 views
  • 0 likes
  • 4 in conversation