BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

How can I find all strings between (),[],and {} (such as <BR>, [FONT],{BODY},'A',"JUICE") and split them in a new variable?

 

Especially, for the 

'JUICE<BR>apple<footer>',I expect to add a blank between 'JUICE' and 'apple'

 

by using the following code,


data have ; infile datalines truncover; input name $100.; datalines;JUICE<BR>apple[footer] juice <BR> apple juice<BODY> 'apple' juice{BODY} apple [BR]juice apple <figure> "juice" LTD
;
run;

data
want1; set have; RegExID = prxparse('/<\w*>/'); start=1; call prxnext(RegExID, start, length(name), name, pos, length); do while (pos > 0); html = substr(name, pos, length); newname=prxchange('s/<\w*>//', -1, name); output; call prxnext(RegExID, start, length(name), name, pos, length); end; keep name html newname; run;

I get 

namehtmlnewname
JUICE<BR>apple<footer><BR>JUICEapple
JUICE<BR>apple<footer><footer>JUICEapple

 

however, I expect to add a blank between 'JUICE' and 'apple'

namehtmlnewname  
JUICE<BR>apple<footer>BRJUICE apple
JUICE<BR>apple<footer>footerJUICE apple

 

 

 

 

Could you please give me some suggestions about this?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Ah, I see what the problem is. The old Quotes Within Quotes problem 🙂

 

This

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD
;

data want;
   format name html newname;
   set have;
   RegExID = prxparse('/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
   newname=prxchange('s/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/ /', -1, name);
      do while (pos > 0);
         html = substr(name, pos+1, length-2);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

proc print data=want;
run;

gives you

 

 

Capture.PNG

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Just made a very small change to your program in the PRXCHANGE Function. See if this does the trick

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple' 
juice{BODY} apple 
[BR]juice apple
<figure> "juice" LTD ;
run;
data want1;
   set have;
   RegExID = prxparse('/<\w*>/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>/ /', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;
Alexxxxxxx
Pyrite | Level 9

thanks draycut,

 

but I get 

namehtmlnewname
JUICE<BR>apple[footer]<BR>JUICE apple[footer]
juice <BR> apple<BR>juice apple
juice<BODY> 'apple'<BODY>juice 'apple'
<figure> "juice" LTD<figure>"juice" LTD

besides, 

 

I cannot get the expected result by following code,

data want1;
   set have;
   RegExID = prxparse('/<\w*>|[\w*]|{\w*}|'\w*'|"\w*"/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>|[\w*]|{\w*}|'\w*'|"\w*"/ /', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run
Alexxxxxxx
Pyrite | Level 9

for example,

 

name
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD

I expect to get 

namehtmlnewname
JUICE<BR>apple[footer] BRJUICE apple 
JUICE<BR>apple[footer] footerJUICE apple 
juice <BR> apple BRjuice  apple 
juice<BODY> 'apple'BODYjuice
juice<BODY> 'apple'applejuice
<figure> "juice" LTDfigureLTD
<figure> "juice" LTDjuiceLTD
PeterClemmensen
Tourmaline | Level 20

Ah, I see what the problem is. The old Quotes Within Quotes problem 🙂

 

This

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple[footer] 
juice <BR> apple 
juice<BODY> 'apple'
<figure> "juice" LTD
;

data want;
   format name html newname;
   set have;
   RegExID = prxparse('/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
   newname=prxchange('s/<\w*>|\[\w*\]|\(\w*\)|"\w*"|''\w*''/ /', -1, name);
      do while (pos > 0);
         html = substr(name, pos+1, length-2);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

proc print data=want;
run;

gives you

 

 

Capture.PNG

Alexxxxxxx
Pyrite | Level 9

Hello @PeterClemmensen 

 

I have a new question during the process.

the 

HARDY(FRNS.)'A'

cannot be processed by the code 

I expect to get 

nameCOMPANY_NAME_inBCOMPANY_NAME_noB
HARDY(FRNS.)'A'FRNS.HARDY
HARDY(FRNS.)'A'AHARDY

However, I only get 

nameCOMPANY_NAME_inBCOMPANY_NAME_noB
HARDY(FRNS.)'A'AHARDY(FRNS.)

 

Could you please give me some suggestions?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 932 views
  • 0 likes
  • 2 in conversation