BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimmychoi
Obsidian | Level 7

Hi all,

 

I have a list of abbreviations, as below,

data abbrev;
infile datalines truncover;
input word $50.;
datalines;
AG
BV
CORPORATION
GMBH
INC
LIMITED
LLC
LP
LTD
PJSC
PLC
PTE
PTY
SA
SA/NV
SL
SPA
SRL
COMPANY
V LP
CO
NV
HOLDINGS
HOLDING
;
run;

and I also have list of firm names, that has some company names ending with abbreviations of above.

For each name among the firm names, I want to iterate through the dataset 'abbrev' and see if the firm name is ending with the any of the abbreviation. If it does, then simply remove the abbreviation.

 

please help.

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Can you post some firm names as data-step using datalines, so that we have something to play with?

 

To be 100% sure: if one of the abbreviations appears in the middle of a company name, the abbreviation is not removed?

 

EDIT: I would load the dataset abbrev in a hash-object, defining word as key. The use

word = scan(firm_name, -1);

to get the last word of each name, check if that word is in the hash-object and finally use prxchange to remove the word from the name:

firm_name = prxchange(cats('s/(.*)\W(', word, ')/$1/'), 1, trim(firm_name));

That should not be to much code to write and perform somewhat fast as long as the number of obs in abbrev is not to high.

 

View solution in original post

3 REPLIES 3
andreas_lds
Jade | Level 19

Can you post some firm names as data-step using datalines, so that we have something to play with?

 

To be 100% sure: if one of the abbreviations appears in the middle of a company name, the abbreviation is not removed?

 

EDIT: I would load the dataset abbrev in a hash-object, defining word as key. The use

word = scan(firm_name, -1);

to get the last word of each name, check if that word is in the hash-object and finally use prxchange to remove the word from the name:

firm_name = prxchange(cats('s/(.*)\W(', word, ')/$1/'), 1, trim(firm_name));

That should not be to much code to write and perform somewhat fast as long as the number of obs in abbrev is not to high.

 

novinosrin
Tourmaline | Level 20

exactly +1 from me

noling
SAS Employee

Here's a brute force method that will work if both your datasets are small. Similar but less efficient to @Andreas_Ids 's hash method:

 

result:

Capture.PNG

data abbrev;
infile datalines truncover;
input word $50.;
datalines;
AG
BV
CORPORATION
GMBH
INC
LIMITED
LLC
LP
LTD
PJSC
PLC
PTE
PTY
SA
SA/NV
SL
SPA
SRL
COMPANY
V LP
CO
NV
HOLDINGS
HOLDING
;
run;

data firms;
infile datalines truncover;
input firm $50.;
firm=strip(firm);
datalines;
SAS AG
GOOGLEBV
not a match
BOSCHINC
my company LIMITED
also not a match
this has 2 abbrevs HOLDING INC
;
run;

*get abbreviations into a macro -assuming that you only have these 24;
*using a ^ as delimeter since some of your abbrevs have spaces;
proc sql noprint;
	select word into :abbrevs separated by '^'
	from abbrev;
quit;
%put &abbrevs;
%let to_loop = %eval(%sysfunc(countc(&abbrevs., "^"))+1);
%put &to_loop;
data final_firms;
	set firms;
	length firm_updated $32.;
	
	match='false';
	i=0;
	*check each firm name against the list of abbreviations;
	do while (match='false' and i < &to_loop);
		i +1; 

		*get current abbreviation and length ;
	    abbrev=scan("&abbrevs", i, "^");
		abbrev_len=(length(abbrev));

		*check for match at end of firm name;
		if strip(upcase(substr(firm, length(firm) - abbrev_len +1))) = strip(upcase(abbrev)) then do;
			firm_updated = substr(firm,1, length(firm) - abbrev_len);
			match='true';
		end;
		else firm_updated=firm;
    end;

	keep firm firm_updated;
run;

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1085 views
  • 1 like
  • 4 in conversation