In a data step I'm reading large XML files that may have correctly formed tags for many records then have a malformed tag record mixed in. I would like to know error variables are set by SAS to indicate it has encountered a malformed tag.
The program currently just writes "ERROR: Unexpected or unmatched end of element tag encountered during XMLInput parsing" to the log when the bad XML is encountered.
I need to know when this happens during the data step so I can programmatically handle the error record. Obviously I need to have error handling reporting the XML issues to the sending systems and not let the malformed files data contaminate the data from good files.
You may want to work with Tech Support on this question. I don't actually know whether the SAS XML Libname Engine (SXLE) uses _ERROR_ or not. And if SXLE uses _ERROR_, I don't know whether you can resume reading an XML file once an error is encountered. I have some memory that the file has to be well-formed in order to be read successfully. You may have to clean up the malformed tags -BEFORE- you use SXLE.
I've displayed _ERROR_ for runs on good and bad files. It is the same regardless of the XML parsing.
&SYSFILRC and END=XYZ both behave like the error record is the end of the file.
Reading the files twice is not a favored solution. The volume of data is too high. The legacy SAS program does that and can take over 24 hours to run for a daily. Not good. I've had good success on similar applications by rewriting the SAS programs to pass the records within each file once buy using a single datastep wrapped in macro language code.
You said: &SYSFILRC and END=XYZ both behave like the error record is the end of the file.
I thought the XML specification REQUIRED that any application that read malformed XML had to report the problem and stop processing immediately. The w3c site has this information, but the Wikipedia article states the rules more succinctly:
--It contains only properly-encoded legal Unicode characters.
--None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles.
--The begin, end, and empty-element tags which delimit the elements are correctly nested, with none missing and none overlapping.
--The element tags are case-sensitive; the beginning and end tags must match exactly.
--There is a single "root" element which contains all the other elements.
The definition of an XML document excludes texts which contain violations of well-formedness rules; they are simply not XML. An XML processor which encounters such a violation is required to report such errors and to cease normal processing.
That's why I suggested Tech Support. I thought that SXLE could only report on malformed XML -- not let you pause to fix it. Tech Support will be your best resource for this question.
I just need to know an incorrect record was found to clean up the partially processed file. Going along with the W3C specifications, it would be preferable to eliminate all data from the file instead of accept records read before encountering the error.
Currently I don't see any feedback from SAS except the ERROR: log message to indicate that a record level error occurred. I dumped dictionary.macros to print to see if any variable indicated the error. None did.
Reading XML files as a text file to count the number of record beginning tags is an option but would badly slow down the process.
Message was edited by: JMarkW
You said: I just need to know an incorrect record was found to clean up the partially processed file.
I think that you are asking for some kind of processing capability with SXLE that it was not designed to do. I do not believe you can "clean up" a partially processed, malformed XML file "on the fly". You need to take this question to Tech Support, where they can find out the definitive answer.
There is an existing defect which was fixed for SAS 9.2 where the return code did not get set appropriately with the XML Engine. Because how invasive the fix was it was not hot fixed for 9.1.3, but fixed in SAS 9.2.