Join Now

Juletip #12: Merry XMLAS - validering af generet XML mod et givent XML schema fra en SAS Session

by SAS Employee SteenBue on ‎12-16-2015 04:05 AM (389 Views)

I forlængelse af det sprøde, knasende Juletip #5, "Brug java i dit data step", er dagens juletip endnu et praktisk eksempel på, hvordan man kan kombinere SAS og Java kode.

 

Naturligvis er der ikke nogen grund til at gøre det bare fordi, det er muligt, men i dagens juletip giver det god mening fordi, vi har brug for at validere en XML fil genereret af SAS kode mod et givent XML schema:
Base SAS har pt ikke en indbygget XML validator, mens der findes standard Java class'er, der kan løse opgaven.

 

For eksempel skal underetninger om mistanke om hvidvask eller finansiering af terrorisme i Danmark foretages til Hvidvasksekretariatet (Statsadvokaten for Særlig Økonomisk og International Kriminalitet) i form af 'goAML' kompatible XML filer. 


GoAML indgår som et strategisk værktøj i Forenede Nationers agentur for narkotika og kriminalitet(UNODC) bekæmpelse af den type global kriminalitet - se mere på http://www.hvidvask.dk og https://goaml.unodc.org/goaml/en/introduction.html.

 

Genereringen af XML er ofte baseret på udtræk af data fra bank systemer eller hvidvask overvågnings løsninger som SAS® Anti-Money Laundering og man kan sikre goAML kompatibiliteten ved foretage en validering af XML mod det aktuelle goAML XML schema ved at anvende følgende metode:

 

Følgende Java source kode skal kompileres: 

/* ../Juletip_12/XMLSchemavalidation.java
 * XMLSchemavalidation.java
 - Copyright (c) 2015, Steen Bue Frederiksen, SAS Institute, All Rights Reserved.
 This method could be used to do generic testing of XML against Schemas and list any errors found
 */

import java.io.File;
import java.io.IOException;
import java.io.PrintStream;
import java.util.*;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.ErrorHandler;

class XMLSchemavalidation {
	private static String mXSDFileName;
           private static String mXMLFileName;
	static int instanceErrorCounter = 0;
	static int ErrorCounter = 0;
	 
  public static void main (String[] args) {
         System.out.println("Main section");
  }
  
  public void ValidateXML (String mXSDFileName, String mXMLFileName) {
	 SchemaFactory localSchemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
	 File localFile1 = new File(mXSDFileName);
    File localFile2 = new File(mXMLFileName);
	                           	  
	 System.out.println("NOTE:    XMLfile - "+mXMLFileName);
	 System.out.println("NOTE:     Schema - "+mXSDFileName );
	 System.out.println("NOTE:    Starting the validate process.");
	 System.out.println("NOTE:    .................. validating.");
	 System.out.println();
	
    try
    {
      Schema localSchema = localSchemaFactory.newSchema(localFile1);
      Validator localValidator = localSchema.newValidator(); 
	  /* Override exception routines to be able to continue the test when errors found */ 
	  final List<SAXParseException> exceptions = new LinkedList<SAXParseException>();
		localValidator.setErrorHandler(new ErrorHandler()
		{
		 @Override
		 public void warning(SAXParseException localSAXException2) throws SAXException
		{
		  exceptions.add(localSAXException2);
		}

		@Override
		public void fatalError(SAXParseException localSAXException2) throws SAXException
		{
		 exceptions.add(localSAXException2);
		}

		@Override
		public void error(SAXParseException localSAXException2) throws SAXException
		{
		 exceptions.add(localSAXException2);
		 System.out.println("ERROR:   Line "+localSAXException2.getLineNumber()+" - "+localSAXException2.getMessage());
		 instanceErrorCounter++;
		 ErrorCounter = ErrorCounter + 1;
		}
		});
		
      StreamSource localStreamSource = new StreamSource(localFile2);  
	  
	  /* Perform the validation routines */
      try
      { 
        localValidator.validate(localStreamSource);
      }
      catch (SAXParseException localSAXException2)
      {
	instanceErrorCounter++;
          ErrorCounter = ErrorCounter + 1;  
          System.out.println("ERROR:   "+mXMLFileName + " fails to validate because: \n");
          System.out.println("ERROR:   Line "+localSAXException2.getLineNumber()+" - "+localSAXException2.getMessage());
          System.out.println();
	System.out.println("ERROR:   This is the first error found - XMLfile text may contain several similar error");
      }
      catch (IOException localIOException)
      {
        instanceErrorCounter++;
         ErrorCounter = ErrorCounter + 1;
         System.err.println("ERROR:  Invalid or missing XML source: " + mXMLFileName);
         System.err.println("ERROR:  "+localIOException.getMessage());
      }
    }
    catch (SAXException localSAXException1)
    {
	 instanceErrorCounter++;
	 ErrorCounter = ErrorCounter + 1;
      	 System.err.println("ERROR:  Invalid or missing XML Schema: " + mXSDFileName);
	 System.err.println("ERROR:  "+localSAXException1.getMessage());
    }
    if (ErrorCounter > 0) {
	  System.out.println();
	  System.out.println("ERROR:   "+ErrorCounter+" error(s) was found.");
    } else {
       	  System.out.println("SUCCESS: No error was found during validation.");
    }
	  System.out.println();
    System.out.println("NOTE:    Ending   the validate process.");	
  }
  
 }
  

  Class-filen placeres et sted, som SAS Sessionen har adgang til - for eksempel ../Juletip_12/XMLSchemavalidation/lib/XMLvalidation.

 

Fra SAS Sessionen kan Java programmet nu kaldes med angivelse af XML filen og XML schema lokationen som parametre:

%let XMLSchemavalidation_path=../Juletip_12;
options nonotes nosource nosource2;
%put ;
%let session_classpath=%SYSGET(CLASSPATH);
%put *** Current CLASSPATH=%sysget(CLASSPATH);
OPTIONS SET=CLASSPATH "&XMLSchemavalidation_path./lib/";
%put ;%put ;
%put *** Current CLASSPATH=%sysget(CLASSPATH);
%put ;%put ;

%put ** 1. Sample with no errors;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
  		,"&XMLSchemavalidation_path./XML/test.xsd"
		,"&XMLSchemavalidation_path./XML/test.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

%put ** 2. Sample with errors;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test.xsd"
		,"&XMLSchemavalidation_path./XML/test_fail_1.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

%put ** 3. Sample with invalid XML character;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test.xsd"
		,"&XMLSchemavalidation_path./XML/test_fail_2.xml"
	       );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

%put ** 4. Sample with non-existing XML file;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test.xsd"
		,"&XMLSchemavalidation_path./XML/test_noexist.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

%put ** 5. Sample with non-existing XSD file;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test_noexist.xsd"
		,"&XMLSchemavalidation_path./XML/test.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

%put ** 6. Sample with invalid XSD file;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test_fail_2.xml"
		,"&XMLSchemavalidation_path./XML/test.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put ;%put ;


%put ** 7. Sample with invalid XML character using dummy XSD;%put  ;
data _null_;
 dcl javaobj j("XMLvalidation.XMLSchemavalidation");
  j.callVoidMethod( "ValidateXML" 
		,"&XMLSchemavalidation_path./XML/test_dummy.xsd"
		,"&XMLSchemavalidation_path./XML/test_fail_2.xml"
	        );
  j.flushJavaOutput();
  j.DELETE();
run;
%put  ;%put  ;

OPTIONS SET=CLASSPATH "&session_classpath";
%put *** Current CLASSPATH=%sysget(CLASSPATH);
options notes source source2;

 

 

Har man ikke mod på at skulle kompilere et Java program og holde styr på den class-filen kan man anvende SAS Proceduren, PROC GROOVY, hvor Java program source kan inkluderes, kompileres og kaldes efterfølgende i een arbejdsgang:

%let XMLSchemavalidation_path=../Juletip_12;
proc groovy ;
  execute  parseonly "&XMLSchemavalidation_path./XMLSchemavalidation.java";
  eval "x = new XMLSchemavalidation(); 
  x.ValidateXML( ""&XMLSchemavalidation_path./XML/test.xsd"", ""&XMLSchemavalidation_path./XML/test.xml"" )";
quit;

 

 

PROC GROOVY kan med fordel anvendes i stand-alone SAS Sessioner - for eksemple i batchjobs - da den kræver adgang til systemrettigheder, der normalt ikke vil være aktiveret i SAS Server sessioner - se mere i PROC GROOVY dokumentation.
 
Hvis du ikke kan vente med at pakke dette juletip ud og anvende det i praksis inkluderer jeg de XML-filer, der henvises til i eksemplet på kald fra SAS:

../Juletip_12/XML/test.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.contoso.com/books" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="bookstore">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" name="book">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="title" type="xs:string" />
                            <xs:element name="author">
                                <xs:complexType>
                                    <xs:sequence>
                                        <xs:element minOccurs="0" name="name" type="xs:string" />
                                        <xs:element minOccurs="0" name="first-name" type="xs:string" />
                                        <xs:element minOccurs="0" name="last-name" type="xs:string" />
                                    </xs:sequence>
                                </xs:complexType>
                            </xs:element>
                            <xs:element name="price" type="xs:decimal" />
                        </xs:sequence>
                        <xs:attribute name="genre" type="xs:string" use="required" />
                        <xs:attribute name="publicationdate" type="xs:date" use="required" />
                        <xs:attribute name="ISBN" type="xs:string" use="required" />
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

../Juletip_12/XML/test_dummy.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.contoso.com/books" xmlns:xs="http://www.w3.org/2001/XMLSchema">
</xs:schema>

../Juletip_12/XML/test.xml

<bookstore xmlns="http://www.contoso.com/books">
  <book genre="autobiography" publicationdate="1981-03-22" ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.99</price>
  </book>
  <book genre="novel" publicationdate="1967-11-17" ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11.99</price>
  </book>
  <book genre="philosophy" publicationdate="1991-02-15" ISBN="1-861001-57-6">
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>

../Juletip_12/XML/test_fail_1.xml

<bookstore xmlns="http://www.contoso.com/books">
  <book genre="autobiography" publicationdate="1981-03-22" ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.98</price>
  </book>
  <book publicationdate="1967-11-17" ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11,99</price>
  </book>
  <book genre="philosophy" publicationdate="1991-02-15" ISBN="1-861001-57-6">
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9,99</price>
  </book>
</bookstore>

../Juletip_12/XML/test_fail_2.xml

<bookstore xmlns="http://www.contoso.com/books">
  <book genre="autobiography" publicationdate="1981-03-22" ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.99</price>
  </book>
  <book genre="novel" publicationdate="1967-11-17" ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11.99</price>
  </book>
  <book genre="philosophy" publicationdate="1991-02-15" ISBN="1-861001-57-6">
    <title>The &Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>


 

 Merry XMLAS and a Happy New Year.....