BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

How can I find all HTML code (such as '<BR>', '<FONT>','<BODY>') and remove them in the variable?

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple<footer> 
juice <BR> apple 
juice<BODY>apple 
juice<BODY> apple 
<BR>juice apple
<figure> juice 
;
run;

Could you please give me some suggestions about this?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is a PRXNEXT example. I have written two different programs. The first outputs an observation for each html found. The second concatenates the found html codes so it has the same number of observations as the input data.

 

@Alexxxxxxx Let me know if it works for you 🙂

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple<footer> 
juice <BR> apple 
juice<BODY>apple 
juice<BODY> apple 
<BR>juice apple
<figure> juice 
;
run;

data want1;
   set have;
   RegExID = prxparse('/<\w*>/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>//', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

data want2;
   set have;
   length html $200;
   RegExID = prxparse('/<\w*>/');
   start=1;
   html="";
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = catx(',', html, substr(name, pos, length));
         newname=prxchange('s/<\w*>//', -1, name);
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
   retain html;
run; 

 

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Something like this?

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple<footer> 
juice <BR> apple 
juice<BODY>apple 
juice<BODY> apple 
<BR>juice apple
<figure> juice 
;
run;

data want;
   set have;
   new=prxchange('s/<\w*>//', -1, name);
run;
Alexxxxxxx
Pyrite | Level 9

Dear draycut,

 

I appreciate your reply and kind advise.

 

May I ask one more question, please? How can I find the HTML code ?

 

thanks for your attention to this matter.

PeterClemmensen
Tourmaline | Level 20

@Alexxxxxxx , when you say HTML Code, do you mean the text inside the <> or including the <>?

 

Also, what do you want to do with it? Put them in a separate variable or?

Alexxxxxxx
Pyrite | Level 9
@PeterClemmensen, I mean both the text and the <>. I expect to find them and put them in a separate variable. Could you please give me some suggestions about this?
PeterClemmensen
Tourmaline | Level 20

Here is a PRXNEXT example. I have written two different programs. The first outputs an observation for each html found. The second concatenates the found html codes so it has the same number of observations as the input data.

 

@Alexxxxxxx Let me know if it works for you 🙂

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple<footer> 
juice <BR> apple 
juice<BODY>apple 
juice<BODY> apple 
<BR>juice apple
<figure> juice 
;
run;

data want1;
   set have;
   RegExID = prxparse('/<\w*>/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>//', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

data want2;
   set have;
   length html $200;
   RegExID = prxparse('/<\w*>/');
   start=1;
   html="";
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = catx(',', html, substr(name, pos, length));
         newname=prxchange('s/<\w*>//', -1, name);
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
   retain html;
run; 

 

Alexxxxxxx
Pyrite | Level 9

Dear draycut,

 

for the 

'JUICE<BR>apple<footer>'

by using the first code,

data want1;
   set have;
   RegExID = prxparse('/<\w*>/');
   start=1;
   call prxnext(RegExID, start, length(name), name, pos, length);
      do while (pos > 0);
         html = substr(name, pos, length);
         newname=prxchange('s/<\w*>//', -1, name);
         output;
         call prxnext(RegExID, start, length(name), name, pos, length);
      end;
   keep name html newname;
run;

I get 

namehtmlnewname
JUICE<BR>apple<footer><BR>JUICEapple
JUICE<BR>apple<footer><footer>JUICEapple

 

however, I expect to add a blank between 'JUICE' and 'apple'

namehtmlnewname  
JUICE<BR>apple<footer><BR>JUICE apple
JUICE<BR>apple<footer><footer>JUICE apple

Could you please give me some suggestions about this?

andreas_lds
Jade | Level 19
Have you tried the code posted by @PeterClemmensen?
Ksharp
Super User
data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
JUICE<BR>apple<footer> 
juice <BR> apple 
juice<BODY>apple 
juice<BODY> apple 
<BR>juice apple
<figure> juice 
;
run;

data want;
   set have;
   new=prxchange('s/<.*?>/ /', -1, name);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 918 views
  • 1 like
  • 4 in conversation