BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

How can I split variable to different variables based on () and ''?

For example, for table a, I expect to get table b.

 

table a 

me (DJI) Kim

juice 'apple' org

apple 'jan' five (ooo) KKK

apple 'jan' (ooo) KKK

 

I expect to get 

table b

name, name_amonge_brackets, name_amonge_single_quotation_marks

me Kim, DJI

juice org,,apple

apple five KKK,ooo,jan

apple five KKK,ooo,jan

 

data a;
  infile datalines dlm=',' truncover;
informat HRM_L2_Step6 $1000. ;
input HRM_L2_Step6 ;
datalines;
UNVL.CERAMIC MATL.
CATHAY INTL.HDG. PREV GOLDEN HDG.
PILLAR PR.
IRISH CONT.GP.UTS
ASHANTI GOLDFIELD (ISE)
ASUSTEK CMP. GDR (XSQ)REG S
ALCATEL 'A' (XSQ)
ALL NIPPON AIRWAYS (XSQ)
ALPS ELECTRIC (XSQ)
AMER 'A' (XSQ)
ATLAS COPCO 'B' (XYZ)
AU.MIKRO SYSTEME (XSQ)
AUTOLIV (XSQ)
AVESTA SHEFFIELD (XSQ)
BNP (XSQ)
BNP CERTS (XSQ)
BANCA DI ROMA (XSQ)
BANCAIRE CIE (XSQ)
BNC.BILBAO ADS (XSQ)
BARLOW (XSQ)
BCI (XSQ)
BENETTON (XSQ)
BERGESEN 'B'N/V (XSQ)
BLYVOORUITIZICHT (XSQ)
BMW (XSQ)DUPL.SEE 740095
BOUYGUES (XSQ)
BUFETE ADS (XSQ)
CS HDG.BR (XSQ)
CSM (XSQ)
CANAL + (XSQ)
;
run;

Could you please give me some suggestions about this? thanks in advance.

 

3 REPLIES 3
Reeza
Super User

SCAN() will get you the word in the middle. 

TRANWRD() will remove that word once you've found it. 

 

Having demo variables as KKK probably isn't a great choice by the way. 

 


@Alexxxxxxx wrote:

Dear all,

 

How can I split variable to different variables based on () and ''?

For example, for table a, I expect to get table b.

 

table a 

me (DJI) Kim

juice 'apple' org

apple 'jan' five (ooo) KKK

apple 'jan' (ooo) KKK

 

I expect to get 

table b

name, name_amonge_brackets, name_amonge_single_quotation_marks

me Kim, DJI

juice org,,apple

apple five KKK,ooo,jan

apple five KKK,ooo,jan

 

data a;
  infile datalines dlm=',' truncover;
informat HRM_L2_Step6 $1000. ;
input HRM_L2_Step6 ;
datalines;
UNVL.CERAMIC MATL.
CATHAY INTL.HDG. PREV GOLDEN HDG.
PILLAR PR.
IRISH CONT.GP.UTS
ASHANTI GOLDFIELD (ISE)
ASUSTEK CMP. GDR (XSQ)REG S
ALCATEL 'A' (XSQ)
ALL NIPPON AIRWAYS (XSQ)
ALPS ELECTRIC (XSQ)
AMER 'A' (XSQ)
ATLAS COPCO 'B' (XYZ)
AU.MIKRO SYSTEME (XSQ)
AUTOLIV (XSQ)
AVESTA SHEFFIELD (XSQ)
BNP (XSQ)
BNP CERTS (XSQ)
BANCA DI ROMA (XSQ)
BANCAIRE CIE (XSQ)
BNC.BILBAO ADS (XSQ)
BARLOW (XSQ)
BCI (XSQ)
BENETTON (XSQ)
BERGESEN 'B'N/V (XSQ)
BLYVOORUITIZICHT (XSQ)
BMW (XSQ)DUPL.SEE 740095
BOUYGUES (XSQ)
BUFETE ADS (XSQ)
CS HDG.BR (XSQ)
CSM (XSQ)
CANAL + (XSQ)
;
run;

Could you please give me some suggestions about this? thanks in advance.

 


 

Patrick
Opal | Level 21

Below code works for your sample data. It might or might not work for your real data (i.e. if you've got an uneven number of single quotes in a string).

data have;
  infile datalines dlm=',' truncover;
  informat HRM_L2_Step6 $1000.;
  input HRM_L2_Step6;
  datalines;
UNVL.CERAMIC MATL.
CATHAY INTL.HDG. PREV GOLDEN HDG.
PILLAR PR.
IRISH CONT.GP.UTS
ASHANTI GOLDFIELD (ISE)
ASUSTEK CMP. GDR (XSQ)REG S
ALCATEL 'A' (XSQ)
ALL NIPPON 'b' AIRWAYS (XSQ) 'A' (XYZ)  
ALPS ELECTRIC (XSQ)
AMER 'A' (XSQ)
ATLAS COPCO 'B' (XYZ)
AU.MIKRO SYSTEME (XSQ)
AUTOLIV (XSQ)
AVESTA SHEFFIELD (XSQ)
BNP (XSQ)
BNP CERTS (XSQ)
BANCA DI ROMA (XSQ)
BANCAIRE CIE (XSQ)
BNC.BILBAO ADS (XSQ)
BARLOW (XSQ)
BCI (XSQ)
BENETTON (XSQ)
BERGESEN 'B'N/V (XSQ)
BLYVOORUITIZICHT (XSQ)
BMW (XSQ)DUPL.SEE 740095
BOUYGUES (XSQ)
BUFETE ADS (XSQ)
CS HDG.BR (XSQ)
CSM (XSQ)
CANAL + (XSQ)
;
run;


data want(drop=_:);

  set have;
  HRM_L2_Step6_orig=HRM_L2_Step6;
  length str_in_brackets str_in_quotes $1000;

  if _n_=1 then
    do;
      _rx_bracket_id=prxparse('/\([^\)]*\)/oi');
      _rx_quote_id=prxparse("/'[^']*'/oi");
      retain _rx_bracket_id _rx_quote_id;
    end;

  do while(1);
    call prxsubstr(_rx_bracket_id, left(hrm_l2_step6), _pos, _len);
    if _len=0 then leave;
    str_in_brackets=catx(' ',str_in_brackets,substrn(HRM_L2_Step6,_pos+1,_len-2));
    HRM_L2_Step6=substrn(HRM_L2_Step6,1,_pos-1)||substrn(HRM_L2_Step6,_pos+_len);
  end;

  do while(1);
    call prxsubstr(_rx_quote_id, left(hrm_l2_step6), _pos, _len);
    if _len=0 then leave;
    str_in_quotes=catx(' ',str_in_quotes,substrn(HRM_L2_Step6,_pos+1,_len-2));
    HRM_L2_Step6=substrn(HRM_L2_Step6,1,_pos-1)||substrn(HRM_L2_Step6,_pos+_len);
  end;

run;

If the code is not sufficient for your real data then try to amend the Regular Expressions to fit what you're dealing with.

PGStats
Opal | Level 21

You can go at it quite systematically with pattern matching functions:

 

data b;
if not patQ then patQ + prxParse("/'.*'/o");
if not patB then patB + prxParse("/\(.*\)/o");
set a;
length name nameQ nameB $1000;
name = HRM_L2_Step6;
start = 1;
call prxNext(patQ, start, -1, name, pos, len);
if len >= 2 then do;
    nameQ = substrn(name, pos+1, len-2);
    name = catx(" ", substrn(name, 1, pos-1), substr(name, pos+len));
    end;
start = 1;
call prxNext(patB, start, -1, name, pos, len);
if len >= 2 then do;
    nameB = substrn(name, pos+1, len-2);
    name = catx(" ", substrn(name, 1, pos-1), substr(name, pos+len));
    end;
drop patQ patB start pos len;
run;
PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 436 views
  • 0 likes
  • 4 in conversation