BookmarkSubscribeRSS Feed
JMarkW
Fluorite | Level 6
In a data step I'm reading large XML files that may have correctly formed tags for many records then have a malformed tag record mixed in. I would like to know error variables are set by SAS to indicate it has encountered a malformed tag.

The program currently just writes "ERROR: Unexpected or unmatched end of element tag encountered during XMLInput parsing" to the log when the bad XML is encountered.

I need to know when this happens during the data step so I can programmatically handle the error record. Obviously I need to have error handling reporting the XML issues to the sending systems and not let the malformed files data contaminate the data from good files.
8 REPLIES 8
Cynthia_sas
SAS Super FREQ
Hi:
You may want to work with Tech Support on this question. I don't actually know whether the SAS XML Libname Engine (SXLE) uses _ERROR_ or not. And if SXLE uses _ERROR_, I don't know whether you can resume reading an XML file once an error is encountered. I have some memory that the file has to be well-formed in order to be read successfully. You may have to clean up the malformed tags -BEFORE- you use SXLE.

To open a track with Tech Support, go to:
http://support.sas.com/ctx/supportform/createForm

cynthia
JMarkW
Fluorite | Level 6
I've displayed _ERROR_ for runs on good and bad files. It is the same regardless of the XML parsing.

&SYSFILRC and END=XYZ both behave like the error record is the end of the file.

Reading the files twice is not a favored solution. The volume of data is too high. The legacy SAS program does that and can take over 24 hours to run for a daily. Not good. I've had good success on similar applications by rewriting the SAS programs to pass the records within each file once buy using a single datastep wrapped in macro language code.
Cynthia_sas
SAS Super FREQ
Hi:
You said:
&SYSFILRC and END=XYZ both behave like the error record is the end of the file.

I thought the XML specification REQUIRED that any application that read malformed XML had to report the problem and stop processing immediately. The w3c site has this information, but the Wikipedia article states the rules more succinctly:

http://en.wikipedia.org/wiki/XML#Well-formedness_and_error-handling
[quote]
Well-formedness and error-handling
The XML specification defines an XML document as a text which is well-formed, i.e., it satisfies a list of syntax rules provided in the specification. The list is fairly lengthy; some key points are:

--It contains only properly-encoded legal Unicode characters.
--None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles.
--The begin, end, and empty-element tags which delimit the elements are correctly nested, with none missing and none overlapping.
--The element tags are case-sensitive; the beginning and end tags must match exactly.
--There is a single "root" element which contains all the other elements.

The definition of an XML document excludes texts which contain violations of well-formedness rules; they are simply not XML. An XML processor which encounters such a violation is required to report such errors and to cease normal processing.
[endquote]

That's why I suggested Tech Support. I thought that SXLE could only report on malformed XML -- not let you pause to fix it. Tech Support will be your best resource for this question.

cynthia
JMarkW
Fluorite | Level 6
I just need to know an incorrect record was found to clean up the partially processed file. Going along with the W3C specifications, it would be preferable to eliminate all data from the file instead of accept records read before encountering the error.

Currently I don't see any feedback from SAS except the ERROR: log message to indicate that a record level error occurred. I dumped dictionary.macros to print to see if any variable indicated the error. None did.

Reading XML files as a text file to count the number of record beginning tags is an option but would badly slow down the process. Message was edited by: JMarkW
Cynthia_sas
SAS Super FREQ
You said:
I just need to know an incorrect record was found to clean up the partially processed file.

I think that you are asking for some kind of processing capability with SXLE that it was not designed to do. I do not believe you can "clean up" a partially processed, malformed XML file "on the fly". You need to take this question to Tech Support, where they can find out the definitive answer.

To open a track with Tech Support, go to:
http://support.sas.com/ctx/supportform/createForm

cynthia
JMarkW
Fluorite | Level 6
I'm not cleaning up the input file. I'm cleaning up the output files and macro counters.

I'll contact tech support
Cynthia_sas
SAS Super FREQ
The output file from SXLE is a SAS dataset...unless you are using it to Export an XML file -- in which case, SXLE should not create malformed XML.

Tech Support really is your best bet here.

cynthia
JMarkW
Fluorite | Level 6
The response from tech support:

There is an existing defect which was fixed for SAS 9.2 where the return code did not get set appropriately with the XML Engine. Because how invasive the fix was it was not hot fixed for 9.1.3, but fixed in SAS 9.2.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2107 views
  • 0 likes
  • 2 in conversation