DATA Step, Macro, Functions and more

XML to SAS Parallel importing/parsing/copying

Reply
New Contributor SK4
New Contributor
Posts: 2

XML to SAS Parallel importing/parsing/copying

I am trying to import thousands of XML records(each record with distinct XML mapping)  into SAS. I wrote SAS program to loop through  each record, copying, parsing, combining and merging the XML results back with the original record. This is all done at the individual record level. Needless to say this is all taking very long.  I have utilized all the common I/O efficiency techniques but it still takes about half second to 1 minute for each record to be processed. It takes about 16 hours to process 15000 records. I was thinking to implement this using Parallel processing.  Can you point me to some guidance on how I can implement these steps in parallel or if there is any other technique I can use to shorten the run time.

Many Thanks!

 

 


/*Options OBS=500;*/
dm 'out;clear;log;clear;' ;
options  mprint /*SPOOL OBS=10 bufno=1000*/;

Libname dsnin "P:\Users\XML\XMLEngine\MaxData\Input";
Libname dsn   "P:\Users\XML\XMLEngine\MaxData\Output";

 

%Let Infile = g5_items_sample;


Proc PrintTo Log  ="C:\XML\XMLROW3\XMLToSAS3.LOG" New;Run;

 

 data xmlrow3;
  set dsnin.xmlrow3(drop=xmlrowx);
    Length RowId  4.;
  rowid+1;
  run; 

   Proc Sql NoPrint;
    Select max(RowId) InTo: nobsxml
    From xmlrow3;
   Quit;


/**
  Loop through the entire file, outputting/importing/processing single record at a time
**/

%Macro ReadXML;
 %Do i=1 %To &nobsxml.;
    Data admin;
     file "C:\XML\XMLROW3\xmlrowx3.xml";
    Set xmlrow3;
    where rowid=&i.;
      Put xmlrow;
    Run;

/* Create the XMLMap file in temporary area and use this map to read data*/

   FileName mapfile  "C:\XML\XMLROW3\xmlrowx3.map";run;
   FileName testfile "C:\XML\XMLROW3\xmlrowx3.xml" ;run;

   Libname testfile xmlv2 xmlmap=mapfile auTomap=replace;run;
 

  /**
    Create macro variables for each table created by XML record
  **/
   Proc Sql NoPrint;
   Select
    memname, memname, memname, cats(memname,"x"), count(*)
    InTo
        :cp_list separated By ' ',
 Smiley SurprisedDataSet1-,
 Smiley SurprisedDslist separated By ' ' ,
 Smiley SurprisedDslistx separated By ' ',
 :numDataSet
    From Dictionary.tables
    Where Libname='TESTFILE'
      ;
  Quit;


  /** Copy xml tables from testfile to work lib **/

  Proc DataSets Lib=work NoList;
   Copy In=testfile Out=work;
   Select &cp_list;
  Run;
  Quit;


/** Create macro variables for columns in each xml table **/

 %Do n=1 %To &numDataSet.;

   Proc Sql NoPrint;
      Select  trim(name), count(name)
    InTo :var1 - :var&SysMaxLong., :nvar
    From Dictionary.Columns
    Where
       /**reverse(substr(reverse(strip(name)),1, 7)) ne "ORDINAL" And **/
    memname =  "&&oDataSet&n.";

    Select count(*) InTo :z From &&oDataSet&n.;
  Quit;

  %let z=&z.;


  Data &&oDataSet&n. /*(drop=&dropcol.)*/;
   Set &&oDataSet&n.;
   rowid=&i.;
  Run;
  Proc Sort;By rowid;Run;

/** if there are multiple records in the dataset, collapse these records into a single record **/
   %If &z. gt 1 %Then
    %Do;
      Data &&oDataSet&n..x;
       Set &&oDataSet&n.;
 By rowid;

  %Do  m=1 %To &nvar.;
   Length &&var&m.._1-&&var&m.._&z. $100.;
   Array &&var&m..x $100. &&var&m.._1-&&var&m.._&z.;
      Retain &&var&m.._1-&&var&m.._&Z.  ;
     %End;
    If first.rowid Then
     Do;
      flg=0;
      Do i=1 To &z.;
       %Do c=1 %To &nvar.;
         &&var&c..x(i)="";
       %End;
     End;
   End;

   flg+1;
   If flg le &z. Then
    Do;
  %Do c=1 %To &nvar.;
    &&var&c..x(flg) =  &&var&c.. ;
     %End;
    End;
  %Do c=1 %To &nvar.;
    drop &&var&c..  ;
     %End;
     If Last.rowid Then Output;
   Run;
 %End;
 %else
   %Do;
    Data &&oDataSet&n..x;
     Set &&oDataSet&n.;
    Run;
  %End;
%End;


/** Merge results with the original data record **/

Data adminx;
  merge admin
        &oDslistx.
  ;By rowid;
Run;


/**Clear out SAS Temp **/
 proc datasets lib=work memtype=data nolist;
  delete &oDslistx. &ODslist.;
 quit;

 

/***/
 %If &i eq 1 %Then
  %Do;
    Data nadmin;
     Set adminx;
    Run;
  %End;
 %Else %If &i gt 1 %Then
  %Do;

    proc sql NoPrint;
      create table nadmin as
       select *
       from nadmin
        OUTER UNION CORR
       select *
       from adminx;
    quit;

   
  %End;
 /***/
%End;
%Mend ReadXML;
%ReadXML;


/** create permanent dataset **/
data dsn.xmlrow3;
 set nadmin;
run;

 

SAS Employee
Posts: 6

Re: XML to SAS Parallel importing/parsing/copying

Have you tried the XML Mapper tool? It's free to download. I've used it half a dozen times or so with success. The XML Mapper creates .map file(s) and you can call those inside of a SAS program editor using code. If the XML files have a consistent naming convention, you can loop through and convert to SAS data sets pretty quickly. 

New Contributor SK4
New Contributor
Posts: 2

Re: XML to SAS Parallel importing/parsing/copying

Posted in reply to ChanceTGardener

Hi Thank you for your response.

Yes I actually include the code to generate the map file for each record and use the map file to read in the xml.

 

 FileName mapfile  "C:\XML\XMLROW3\xmlrowx3.map";run;
   FileName testfile "C:\XML\XMLROW3\xmlrowx3.xml" ;run;

   Libname testfile xmlv2 xmlmap=mapfile auTomap=replace;run;

 

The processing time increases during the Append step as the file grows for each iteration. If I can place that part in the Parallel processing alone might help.

 

Thanks

 

Ask a Question
Discussion stats
  • 2 replies
  • 44 views
  • 1 like
  • 2 in conversation