06-15-2012 01:10 PM
Thanks for reading this first off. This is a re-post to get new exposure to a fustrating problem.
Here's the deal, when I read an XML CDATA field into SAS I loose '0D0A'x characters. Even when I specify a $charw. informat! I've searched everywhere to figure out how to retain these characters when reading character data that contains them, but to no avail.
For example, if the XML raw data is as shown below:
<Item_Response><![CDATA[ This is the data I want.
I just pressed Enter.
Instead of this: "This is the data I want.'0D0A'xI just pressed Enter.'0D0A'xAgain."
I get this: "This is the data I want.'2020'xI just pressed Enter.'2020'xAgain."
Below is the part of the XML Map corresponding to the misbehaving field:
Below is the SAS code libname statment (I'm using SAS 9.2):
libname XML_lib xml "&workhere.\MyFile.xml" xmlmap="&workhere.\XML_MAP.map";
11-08-2013 01:39 PM
The XML specification (Extensible Markup Language (XML) 1.0 (Fifth Edition)) doesn't directly allow that:
To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
Note that the EOL translation occurs before parsing, so it makes no difference that it happens in a CDATA section.
You should be able to escape the CR and LF characters as follows, but you'll have to handle the un-escaping as well.
<![CDATA[ This is the data I want.
I just pressed Enter.
04-23-2014 08:58 AM
This is an issue with the older SAS XML engine. Whitespace clusters are replaced with a single space. A notable exception is the linefeed character entity.
and will be replaced with the linefeed character, no CDATA block needed.
Since you're using SAS 9.2, you have access to the newer XML92 engine, which leaves whitespace (including but not necessarily limited to newlines, tabs, and multiple spaces) intact. Simply replace "xml" with "xml92" in your libname statement. See the following paper for some discussion of this and other changes in the XML engine (including a new directive to permit or reject some literal characters like quotation marks): http://www2.sas.com/proceedings/sugi28/173-28.pdf
If you're using LoadXL.sas to read a Microsoft Excel XML file, amend the libname statement on line 138 of the program. (Note also that the ExcelXP.map XML map truncates strings at 1024 characters; edit line 92 to increase the length.) See the following paper for details on reading and writing Excel workbooks: http://www2.sas.com/proceedings/sugi31/115-31.pdf (Note the download referenced in the paper is now in the 2006 archives.) If anyone is aware of an updated paper, please let me know. Thanks!