Solved: Removing postfix (e.g Inc, Limited) in corporate names

jimmychoi · Posted 02-04-2020 10:08 AM

Hi experts,

the data set I have bunch of post-fixes in the end of the company, example (post-fixes):

(canada)

(india)

ab

ag

bv

co

co ltd

gmbh

inc

india

limited

llc

ltd

mbh

nv

plc

private limited

pte ltd

pty ltd

pvt ltd

s.a

sa

sl

spa

srl

usa

corp

sro

a/s

what I want to achieve is, remove those from the comapnies names.

the postfixes and actual comapnies names, their are a single blank within. example:

180 Medical Inc

1ST Bio Therapeutics Inc

1ST Bio Therapeutics INC

1abtik ag

Camurus AB

Korning AG

PlasmaChem GmbH

PlasmaChem GMBH

how do I remove those post-fixes from original data set?

and some post fixes are written upper-case, and some are in lower-case, some are mixed.

ed_sas_member · Posted 02-04-2020 11:51 AM

Hi @jimmychoi

Here is an attempt to do this. Does that meet your expectations?

data word_list;
	input word $20.;
	if findc(word,"/") then word=tranwrd(word,"/",".");
	datalines;
(canada)
(india)
ab
ag
bv
co
co ltd
gmbh
inc
india
limited
llc
ltd
mbh
nv
plc
private limited
pte ltd
pty ltd
pvt ltd
s.a
sa
sl
spa
srl
usa
corp
sro
a/s
;
run;

data have;
	input company $80.;
	datalines;
180 Medical Inc
1ST Bio Therapeutics Inc
1ST Bio Therapeutics INC
1abtik ag
Camurus AB
Korning AG
PlasmaChem GmbH
PlasmaChem GMBH
;
run;

proc sql noprint;
	select quote(trim(word),"'")
	into:word_list separated by ","
	from word_list
	order by countw(word) desc;
quit;

data want;
	set have;
	company = tranwrd(company,"/",".");
	array _list(&sqlobs) $ _temporary_ (&word_list);
	do i=1 to dim(_list);
		if find(company,_list(i),"i")>0 then do;
			company = prxchange(cats("s/(\(?\b",_list(i),"\b\)?\s*)$//i"),-1,company);
		end;
	end;
	drop i;
run;

View solution in original post

ed_sas_member · Posted 02-04-2020 11:51 AM

Hi @jimmychoi

Here is an attempt to do this. Does that meet your expectations?

data word_list;
	input word $20.;
	if findc(word,"/") then word=tranwrd(word,"/",".");
	datalines;
(canada)
(india)
ab
ag
bv
co
co ltd
gmbh
inc
india
limited
llc
ltd
mbh
nv
plc
private limited
pte ltd
pty ltd
pvt ltd
s.a
sa
sl
spa
srl
usa
corp
sro
a/s
;
run;

data have;
	input company $80.;
	datalines;
180 Medical Inc
1ST Bio Therapeutics Inc
1ST Bio Therapeutics INC
1abtik ag
Camurus AB
Korning AG
PlasmaChem GmbH
PlasmaChem GMBH
;
run;

proc sql noprint;
	select quote(trim(word),"'")
	into:word_list separated by ","
	from word_list
	order by countw(word) desc;
quit;

data want;
	set have;
	company = tranwrd(company,"/",".");
	array _list(&sqlobs) $ _temporary_ (&word_list);
	do i=1 to dim(_list);
		if find(company,_list(i),"i")>0 then do;
			company = prxchange(cats("s/(\(?\b",_list(i),"\b\)?\s*)$//i"),-1,company);
		end;
	end;
	drop i;
run;

Removing postfix (e.g Inc, Limited) in corporate names

Re: Removing postfix (e.g Inc, Limited) in corporate names

Re: Removing postfix (e.g Inc, Limited) in corporate names

Removing postfix (e.g Inc, Limited) in corporate names

Re: Removing postfix (e.g Inc, Limited) in corporate names

Re: Removing postfix (e.g Inc, Limited) in corporate names

Click image to register for webinar

Classroom Training Available!