About piton

piton · ‎06-26-2014

Thank you. I have a sample program . Here is what I tried data t; text1='<html> <head> <meta name=''generator'' content=''HTML Tidy, see www.w3.org'' /> <title></title> </head> <body> <p>Test</p> <p></p> <table style=''WIDTH: 360.0pt;BORDER-COLLAPSE: collapse;'' border=''0'' cellspacing=''0'' cellpadding=''0'' width=''480''>' ; regex = prxparse('s/<\s+.*?>/ /'); call prxchange(regex,-1,text1); put text1; run; But it did not work

piton · ‎06-25-2014

My sas table contains two columns: id and my_text . Each observation of my_text variable is a complete html string, something like this <html> . . . <head> . . . <meta name="generator" content="HTML Tidy, see www.w3.org" /> <table style="WIDTH: 360.0pt;BORDER-COLLAPSE: collapse;" border="0" cellspacing="0" cellpadding="0" width="480"> <tr style="HEIGHT: 15.0pt;"> <td style="BORDER-BOTTOM: rgb(236,233,216); BORDER-LEFT: rgb(236,233,216); BACKGROUND-COLOR: transparent; WIDTH: 360.0pt;HEIGHT: 15.0pt; " width="480"> So it contains <table> <tr <td . . . and other complex html structures. How to parse this kind of html into plain text ? ____________________________________________________________________________________________________________________ I need to get a SAS table like this id my_text plain_text 1 <html> . . . <head> . . . <meta blah ... blah ... blah ... ONLY the "blah ... blah ... blah ... " part of my_text 2 ... ... ____________________________________________________________________________________________________________________ PS I was looking everywhere online for a good parsing code, however all the example are very trivial. The following PERL expression works fine only for 5 bytes. So this approach is okay for very simple tags .In my case it is useless. Please help rx1=prxparse("s/<.*?>//"); call prxchange(rx1,99,my_text);

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: How to parse html?

How to parse html?

Re: How to parse html?

How to parse html?