My sas table contains two columns: id and my_text . Each observation of my_text variable is a complete html string, something like this <html> . . . <head> . . . <meta name="generator" content="HTML Tidy, see www.w3.org" /> <table style="WIDTH: 360.0pt;BORDER-COLLAPSE: collapse;" border="0" cellspacing="0" cellpadding="0" width="480"> <tr style="HEIGHT: 15.0pt;"> <td style="BORDER-BOTTOM: rgb(236,233,216); BORDER-LEFT: rgb(236,233,216); BACKGROUND-COLOR: transparent; WIDTH: 360.0pt;HEIGHT: 15.0pt; " width="480"> So it contains <table> <tr <td . . . and other complex html structures. How to parse this kind of html into plain text ? ____________________________________________________________________________________________________________________ I need to get a SAS table like this id my_text plain_text 1 <html> . . . <head> . . . <meta blah ... blah ... blah ... ONLY the "blah ... blah ... blah ... " part of my_text 2 ... ... ____________________________________________________________________________________________________________________ PS I was looking everywhere online for a good parsing code, however all the example are very trivial. The following PERL expression works fine only for 5 bytes. So this approach is okay for very simple tags .In my case it is useless. Please help rx1=prxparse("s/<.*?>//"); call prxchange(rx1,99,my_text);
... View more