Dear all,
How can I find all HTML code (such as '<BR>', '<FONT>','<BODY>') and remove them in the variable?
data have ;
infile datalines truncover;
input name $100.;
datalines;
JUICE<BR>apple<footer>
juice <BR> apple
juice<BODY>apple
juice<BODY> apple
<BR>juice apple
<figure> juice
;
run;
Could you please give me some suggestions about this?
thanks in advance.
Here is a PRXNEXT example. I have written two different programs. The first outputs an observation for each html found. The second concatenates the found html codes so it has the same number of observations as the input data.
@Alexxxxxxx Let me know if it works for you 🙂
data have ;
infile datalines truncover;
input name $100.;
datalines;
JUICE<BR>apple<footer>
juice <BR> apple
juice<BODY>apple
juice<BODY> apple
<BR>juice apple
<figure> juice
;
run;
data want1;
set have;
RegExID = prxparse('/<\w*>/');
start=1;
call prxnext(RegExID, start, length(name), name, pos, length);
do while (pos > 0);
html = substr(name, pos, length);
newname=prxchange('s/<\w*>//', -1, name);
output;
call prxnext(RegExID, start, length(name), name, pos, length);
end;
keep name html newname;
run;
data want2;
set have;
length html $200;
RegExID = prxparse('/<\w*>/');
start=1;
html="";
call prxnext(RegExID, start, length(name), name, pos, length);
do while (pos > 0);
html = catx(',', html, substr(name, pos, length));
newname=prxchange('s/<\w*>//', -1, name);
call prxnext(RegExID, start, length(name), name, pos, length);
end;
keep name html newname;
retain html;
run;
Something like this?
data have ;
infile datalines truncover;
input name $100.;
datalines;
JUICE<BR>apple<footer>
juice <BR> apple
juice<BODY>apple
juice<BODY> apple
<BR>juice apple
<figure> juice
;
run;
data want;
set have;
new=prxchange('s/<\w*>//', -1, name);
run;
Dear draycut,
I appreciate your reply and kind advise.
May I ask one more question, please? How can I find the HTML code ?
thanks for your attention to this matter.
@Alexxxxxxx , when you say HTML Code, do you mean the text inside the <> or including the <>?
Also, what do you want to do with it? Put them in a separate variable or?
Here is a PRXNEXT example. I have written two different programs. The first outputs an observation for each html found. The second concatenates the found html codes so it has the same number of observations as the input data.
@Alexxxxxxx Let me know if it works for you 🙂
data have ;
infile datalines truncover;
input name $100.;
datalines;
JUICE<BR>apple<footer>
juice <BR> apple
juice<BODY>apple
juice<BODY> apple
<BR>juice apple
<figure> juice
;
run;
data want1;
set have;
RegExID = prxparse('/<\w*>/');
start=1;
call prxnext(RegExID, start, length(name), name, pos, length);
do while (pos > 0);
html = substr(name, pos, length);
newname=prxchange('s/<\w*>//', -1, name);
output;
call prxnext(RegExID, start, length(name), name, pos, length);
end;
keep name html newname;
run;
data want2;
set have;
length html $200;
RegExID = prxparse('/<\w*>/');
start=1;
html="";
call prxnext(RegExID, start, length(name), name, pos, length);
do while (pos > 0);
html = catx(',', html, substr(name, pos, length));
newname=prxchange('s/<\w*>//', -1, name);
call prxnext(RegExID, start, length(name), name, pos, length);
end;
keep name html newname;
retain html;
run;
Dear draycut,
for the
'JUICE<BR>apple<footer>'
by using the first code,
data want1;
set have;
RegExID = prxparse('/<\w*>/');
start=1;
call prxnext(RegExID, start, length(name), name, pos, length);
do while (pos > 0);
html = substr(name, pos, length);
newname=prxchange('s/<\w*>//', -1, name);
output;
call prxnext(RegExID, start, length(name), name, pos, length);
end;
keep name html newname;
run;
I get
name | html | newname |
JUICE<BR>apple<footer> | <BR> | JUICEapple |
JUICE<BR>apple<footer> | <footer> | JUICEapple |
however, I expect to add a blank between 'JUICE' and 'apple'
name | html | newname |
JUICE<BR>apple<footer> | <BR> | JUICE apple |
JUICE<BR>apple<footer> | <footer> | JUICE apple |
Could you please give me some suggestions about this?
data have ; infile datalines truncover; input name $100.; datalines; JUICE<BR>apple<footer> juice <BR> apple juice<BODY>apple juice<BODY> apple <BR>juice apple <figure> juice ; run; data want; set have; new=prxchange('s/<.*?>/ /', -1, name); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.