Dear all,
How can I split variable to different variables based on () and ''?
For example, for table a, I expect to get table b.
table a
me (DJI) Kim
juice 'apple' org
apple 'jan' five (ooo) KKK
apple 'jan' (ooo) KKK
I expect to get
table b
name, name_amonge_brackets, name_amonge_single_quotation_marks
me Kim, DJI
juice org,,apple
apple five KKK,ooo,jan
apple five KKK,ooo,jan
data a;
infile datalines dlm=',' truncover;
informat HRM_L2_Step6 $1000. ;
input HRM_L2_Step6 ;
datalines;
UNVL.CERAMIC MATL.
CATHAY INTL.HDG. PREV GOLDEN HDG.
PILLAR PR.
IRISH CONT.GP.UTS
ASHANTI GOLDFIELD (ISE)
ASUSTEK CMP. GDR (XSQ)REG S
ALCATEL 'A' (XSQ)
ALL NIPPON AIRWAYS (XSQ)
ALPS ELECTRIC (XSQ)
AMER 'A' (XSQ)
ATLAS COPCO 'B' (XYZ)
AU.MIKRO SYSTEME (XSQ)
AUTOLIV (XSQ)
AVESTA SHEFFIELD (XSQ)
BNP (XSQ)
BNP CERTS (XSQ)
BANCA DI ROMA (XSQ)
BANCAIRE CIE (XSQ)
BNC.BILBAO ADS (XSQ)
BARLOW (XSQ)
BCI (XSQ)
BENETTON (XSQ)
BERGESEN 'B'N/V (XSQ)
BLYVOORUITIZICHT (XSQ)
BMW (XSQ)DUPL.SEE 740095
BOUYGUES (XSQ)
BUFETE ADS (XSQ)
CS HDG.BR (XSQ)
CSM (XSQ)
CANAL + (XSQ)
;
run;
Could you please give me some suggestions about this? thanks in advance.
SCAN() will get you the word in the middle.
TRANWRD() will remove that word once you've found it.
Having demo variables as KKK probably isn't a great choice by the way.
@Alexxxxxxx wrote:
Dear all,
How can I split variable to different variables based on () and ''?
For example, for table a, I expect to get table b.
table a
me (DJI) Kim
juice 'apple' org
apple 'jan' five (ooo) KKK
apple 'jan' (ooo) KKK
I expect to get
table b
name, name_amonge_brackets, name_amonge_single_quotation_marks
me Kim, DJI
juice org,,apple
apple five KKK,ooo,jan
apple five KKK,ooo,jan
data a; infile datalines dlm=',' truncover; informat HRM_L2_Step6 $1000. ; input HRM_L2_Step6 ; datalines; UNVL.CERAMIC MATL. CATHAY INTL.HDG. PREV GOLDEN HDG. PILLAR PR. IRISH CONT.GP.UTS ASHANTI GOLDFIELD (ISE) ASUSTEK CMP. GDR (XSQ)REG S ALCATEL 'A' (XSQ) ALL NIPPON AIRWAYS (XSQ) ALPS ELECTRIC (XSQ) AMER 'A' (XSQ) ATLAS COPCO 'B' (XYZ) AU.MIKRO SYSTEME (XSQ) AUTOLIV (XSQ) AVESTA SHEFFIELD (XSQ) BNP (XSQ) BNP CERTS (XSQ) BANCA DI ROMA (XSQ) BANCAIRE CIE (XSQ) BNC.BILBAO ADS (XSQ) BARLOW (XSQ) BCI (XSQ) BENETTON (XSQ) BERGESEN 'B'N/V (XSQ) BLYVOORUITIZICHT (XSQ) BMW (XSQ)DUPL.SEE 740095 BOUYGUES (XSQ) BUFETE ADS (XSQ) CS HDG.BR (XSQ) CSM (XSQ) CANAL + (XSQ) ; run;
Could you please give me some suggestions about this? thanks in advance.
Below code works for your sample data. It might or might not work for your real data (i.e. if you've got an uneven number of single quotes in a string).
data have;
infile datalines dlm=',' truncover;
informat HRM_L2_Step6 $1000.;
input HRM_L2_Step6;
datalines;
UNVL.CERAMIC MATL.
CATHAY INTL.HDG. PREV GOLDEN HDG.
PILLAR PR.
IRISH CONT.GP.UTS
ASHANTI GOLDFIELD (ISE)
ASUSTEK CMP. GDR (XSQ)REG S
ALCATEL 'A' (XSQ)
ALL NIPPON 'b' AIRWAYS (XSQ) 'A' (XYZ)
ALPS ELECTRIC (XSQ)
AMER 'A' (XSQ)
ATLAS COPCO 'B' (XYZ)
AU.MIKRO SYSTEME (XSQ)
AUTOLIV (XSQ)
AVESTA SHEFFIELD (XSQ)
BNP (XSQ)
BNP CERTS (XSQ)
BANCA DI ROMA (XSQ)
BANCAIRE CIE (XSQ)
BNC.BILBAO ADS (XSQ)
BARLOW (XSQ)
BCI (XSQ)
BENETTON (XSQ)
BERGESEN 'B'N/V (XSQ)
BLYVOORUITIZICHT (XSQ)
BMW (XSQ)DUPL.SEE 740095
BOUYGUES (XSQ)
BUFETE ADS (XSQ)
CS HDG.BR (XSQ)
CSM (XSQ)
CANAL + (XSQ)
;
run;
data want(drop=_:);
set have;
HRM_L2_Step6_orig=HRM_L2_Step6;
length str_in_brackets str_in_quotes $1000;
if _n_=1 then
do;
_rx_bracket_id=prxparse('/\([^\)]*\)/oi');
_rx_quote_id=prxparse("/'[^']*'/oi");
retain _rx_bracket_id _rx_quote_id;
end;
do while(1);
call prxsubstr(_rx_bracket_id, left(hrm_l2_step6), _pos, _len);
if _len=0 then leave;
str_in_brackets=catx(' ',str_in_brackets,substrn(HRM_L2_Step6,_pos+1,_len-2));
HRM_L2_Step6=substrn(HRM_L2_Step6,1,_pos-1)||substrn(HRM_L2_Step6,_pos+_len);
end;
do while(1);
call prxsubstr(_rx_quote_id, left(hrm_l2_step6), _pos, _len);
if _len=0 then leave;
str_in_quotes=catx(' ',str_in_quotes,substrn(HRM_L2_Step6,_pos+1,_len-2));
HRM_L2_Step6=substrn(HRM_L2_Step6,1,_pos-1)||substrn(HRM_L2_Step6,_pos+_len);
end;
run;
If the code is not sufficient for your real data then try to amend the Regular Expressions to fit what you're dealing with.
You can go at it quite systematically with pattern matching functions:
data b;
if not patQ then patQ + prxParse("/'.*'/o");
if not patB then patB + prxParse("/\(.*\)/o");
set a;
length name nameQ nameB $1000;
name = HRM_L2_Step6;
start = 1;
call prxNext(patQ, start, -1, name, pos, len);
if len >= 2 then do;
nameQ = substrn(name, pos+1, len-2);
name = catx(" ", substrn(name, 1, pos-1), substr(name, pos+len));
end;
start = 1;
call prxNext(patB, start, -1, name, pos, len);
if len >= 2 then do;
nameB = substrn(name, pos+1, len-2);
name = catx(" ", substrn(name, 1, pos-1), substr(name, pos+len));
end;
drop patQ patB start pos len;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.