Hi experts,
the data set I have bunch of post-fixes in the end of the company, example (post-fixes):
(canada) |
(india) |
ab |
ag |
bv |
co |
co ltd |
gmbh |
inc |
india |
limited |
llc |
ltd |
mbh |
nv |
plc |
private limited |
pte ltd |
pty ltd |
pvt ltd |
s.a |
sa |
sl |
spa |
srl |
usa |
corp |
sro |
a/s |
what I want to achieve is, remove those from the comapnies names.
the postfixes and actual comapnies names, their are a single blank within. example:
180 Medical Inc
1ST Bio Therapeutics Inc
1ST Bio Therapeutics INC
1abtik ag
Camurus AB
Korning AG
PlasmaChem GmbH
PlasmaChem GMBH
how do I remove those post-fixes from original data set?
and some post fixes are written upper-case, and some are in lower-case, some are mixed.
Hi @jimmychoi
Here is an attempt to do this. Does that meet your expectations?
data word_list;
input word $20.;
if findc(word,"/") then word=tranwrd(word,"/",".");
datalines;
(canada)
(india)
ab
ag
bv
co
co ltd
gmbh
inc
india
limited
llc
ltd
mbh
nv
plc
private limited
pte ltd
pty ltd
pvt ltd
s.a
sa
sl
spa
srl
usa
corp
sro
a/s
;
run;
data have;
input company $80.;
datalines;
180 Medical Inc
1ST Bio Therapeutics Inc
1ST Bio Therapeutics INC
1abtik ag
Camurus AB
Korning AG
PlasmaChem GmbH
PlasmaChem GMBH
;
run;
proc sql noprint;
select quote(trim(word),"'")
into:word_list separated by ","
from word_list
order by countw(word) desc;
quit;
data want;
set have;
company = tranwrd(company,"/",".");
array _list(&sqlobs) $ _temporary_ (&word_list);
do i=1 to dim(_list);
if find(company,_list(i),"i")>0 then do;
company = prxchange(cats("s/(\(?\b",_list(i),"\b\)?\s*)$//i"),-1,company);
end;
end;
drop i;
run;
Hi @jimmychoi
Here is an attempt to do this. Does that meet your expectations?
data word_list;
input word $20.;
if findc(word,"/") then word=tranwrd(word,"/",".");
datalines;
(canada)
(india)
ab
ag
bv
co
co ltd
gmbh
inc
india
limited
llc
ltd
mbh
nv
plc
private limited
pte ltd
pty ltd
pvt ltd
s.a
sa
sl
spa
srl
usa
corp
sro
a/s
;
run;
data have;
input company $80.;
datalines;
180 Medical Inc
1ST Bio Therapeutics Inc
1ST Bio Therapeutics INC
1abtik ag
Camurus AB
Korning AG
PlasmaChem GmbH
PlasmaChem GMBH
;
run;
proc sql noprint;
select quote(trim(word),"'")
into:word_list separated by ","
from word_list
order by countw(word) desc;
quit;
data want;
set have;
company = tranwrd(company,"/",".");
array _list(&sqlobs) $ _temporary_ (&word_list);
do i=1 to dim(_list);
if find(company,_list(i),"i")>0 then do;
company = prxchange(cats("s/(\(?\b",_list(i),"\b\)?\s*)$//i"),-1,company);
end;
end;
drop i;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.