Dear Cynthia, here is the code (after filename url) data Test; length Record $1000; infile foo lrecl=32700; input; do until (_Top); _Top=find(_infile_,'<article>'); if not _Top then input; end; do until (_Bottom); input; _Bottom=find(_infile_,' </article>'); if _Bottom then STOP; else do; Record=_infile_; /*This removes all the html tag*/ *rx1=prxparse("s/<.*?>//"); *call prxchange(rx1,99,Record); output; end; end; Drop _:; Drop rx1; run; ------------------------------------------------------------------------------------------------------------------- Here is the html sample (as it is huge, I have taken only a portion of it). <article> <div xmlns="http://www.w3.org/1999/xhtml" xmlns:h="http://www.w3.org/1999/xhtml" id="mainContent"><header><h1 class="entryType">Definition of <b>test</b> in English </h1><br/><h2 class="pageTitle entryTitle">test<span class="homograph">1</span></h2><div class="entryPronunciation headpron">Pronunciation: <a href="http://oxforddictionaries.com/words/key-to-pronunciation"> /tɛst/</a></div>........ </article> --------------------------------------------------------------------------------------------------------- I believe Tom's answer will work. However, thanks in advance for any insight your can throw. Jijil Ramakrishnan
... View more