BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

How can I find all strings between (),[],and {} (such as <BR>, [FONT],{BODY},'A',"JUICE") and split them in a new variable?

 

Especially, for the 

'JUICE<BR>apple<footer>',I expect to add a blank between 'JUICE' and 'apple'

 

by using the following code,


data have ; infile datalines truncover; input name $100.; datalines;JUICE<BR>apple[footer] juice <BR> apple juice<BODY> 'apple' juice{BODY} apple [BR]juice apple <figure> "juice" LTD
;
run;

data
want1; set have; RegExID = prxparse('/<\w*>/'); start=1; call prxnext(RegExID, start, length(name), name, pos, length); do while (pos > 0); html = substr(name, pos, length); newname=prxchange('s/<\w*>//', -1, name); output; call prxnext(RegExID, start, length(name), name, pos, length); end; keep name html newname; run;

I get 

namehtmlnewname
JUICE<BR>apple<footer><BR>JUICEapple
JUICE<BR>apple<footer><footer>JUICEapple

 

however, I expect to add a blank between 'JUICE' and 'apple'

namehtmlnewname  
JUICE<BR>apple<footer>BRJUICE apple
JUICE<BR>apple<footer>footerJUICE apple

 

 

 

 

Could you please give me some suggestions about this?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Ah, I see what the problem is. The old Quotes Within Quotes problem 🙂

 

This

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD
;

data want;
   format name html newname;
   set have;
   RegExID = prxparse('/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
   newname=prxchange('s/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/ /', -1, name);
      do while (pos > 0);
         html = substr(name, pos+1, length-2);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

proc print data=want;
run;

gives you

 

 

Capture.PNG

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Just made a very small change to your program in the PRXCHANGE Function. See if this does the trick

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple' 
juice{BODY} apple 
[BR]juice apple
<figure> "juice" LTD ;
run;
data want1;
   set have;
   RegExID = prxparse('/<\w*>/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>/ /', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;
Alexxxxxxx
Pyrite | Level 9

thanks draycut,

 

but I get 

namehtmlnewname
JUICE<BR>apple[footer]<BR>JUICE apple[footer]
juice <BR> apple<BR>juice apple
juice<BODY> 'apple'<BODY>juice 'apple'
<figure> "juice" LTD<figure>"juice" LTD

besides, 

 

I cannot get the expected result by following code,

data want1;
   set have;
   RegExID = prxparse('/<\w*>|[\w*]|{\w*}|'\w*'|"\w*"/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>|[\w*]|{\w*}|'\w*'|"\w*"/ /', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run
Alexxxxxxx
Pyrite | Level 9

for example,

 

name
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD

I expect to get 

namehtmlnewname
JUICE<BR>apple[footer] BRJUICE apple 
JUICE<BR>apple[footer] footerJUICE apple 
juice <BR> apple BRjuice  apple 
juice<BODY> 'apple'BODYjuice
juice<BODY> 'apple'applejuice
<figure> "juice" LTDfigureLTD
<figure> "juice" LTDjuiceLTD
PeterClemmensen
Tourmaline | Level 20

Ah, I see what the problem is. The old Quotes Within Quotes problem 🙂

 

This

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD
;

data want;
   format name html newname;
   set have;
   RegExID = prxparse('/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
   newname=prxchange('s/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/ /', -1, name);
      do while (pos > 0);
         html = substr(name, pos+1, length-2);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

proc print data=want;
run;

gives you

 

 

Capture.PNG

Alexxxxxxx
Pyrite | Level 9

Hello @PeterClemmensen 

 

I have a new question during the process.

the 

HARDY(FRNS.)'A'

cannot be processed by the code 

I expect to get 

nameCOMPANY_NAME_inBCOMPANY_NAME_noB
HARDY(FRNS.)'A'FRNS.HARDY
HARDY(FRNS.)'A'AHARDY

However, I only get 

nameCOMPANY_NAME_inBCOMPANY_NAME_noB
HARDY(FRNS.)'A'AHARDY(FRNS.)

 

Could you please give me some suggestions?

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1626 views
  • 0 likes
  • 2 in conversation