Cynthia, thanks for your response.
I am inspired to provide a bit more context about what has created this scenario.
First, the content that we are dealing with here is narrative commentary that belongs with performance results. People provide this commentary by typing into a text box on a web form, and while doing so they may make use of a range of formatting options such as paragraphs, numbering, bullets, styling text etc. Once they provide this information and click save, their narrative is saved to a database field with html tags to preserve the formatting that they provided. Hence, it is "data" rather than web pages that I am parsing.
Once the data is stored, it gets presented in any of a number of ways. First, it may be displayed as part of a web page that presents information for a particular KPI. In this case, it's as simple as retrieving the content of that field into the right place in a web page, or in a frame on a web page. Similarly, it may be dumped out to be part of an email message, in which case again, the email is made an html document type, and it slides in nicely. Finally, there are a lot of pdf reports that present back this information, and that's where I'm up to now. The pdfs are generated on the fly using stored processes, and may be filtered based on user preferences. In these pdf documents, usually a table of KPIs is presented, and the commentary is included as the contents of a cell for the relevant KPI row for which it belongs.
Obviously the html tags can't be shown in the cell, nor can we just strip out the tags and leave the unformatted text. So, I am working on creating an approximation of the html formatting using inline style statements. There are a limited number of formatting tags that can be used, so I'm not trying to reproduce whole pages of html for example with positioned images and so forth, just those relating to text content. As for the P tags, perhaps the text input box conversion is putting them in when it ought not to. I'll look into that.
I tried including some demo code with datalines containing text with tags, but of course it all gets mangled by the markup for this forum. Is there some way I can send demo code directly to you?
Thanks!
TF
... View more