BookmarkSubscribeRSS Feed
SK4
Fluorite | Level 6 SK4
Fluorite | Level 6

I am trying to import thousands of XML records(each record with distinct XML mapping)  into SAS. I wrote SAS program to loop through  each record, copying, parsing, combining and merging the XML results back with the original record. This is all done at the individual record level. Needless to say this is all taking very long.  I have utilized all the common I/O efficiency techniques but it still takes about half second to 1 minute for each record to be processed. It takes about 16 hours to process 15000 records. I was thinking to implement this using Parallel processing.  Can you point me to some guidance on how I can implement these steps in parallel or if there is any other technique I can use to shorten the run time.

Many Thanks!

 

 


/*Options OBS=500;*/
dm 'out;clear;log;clear;' ;
options  mprint /*SPOOL OBS=10 bufno=1000*/;

Libname dsnin "P:\Users\XML\XMLEngine\MaxData\Input";
Libname dsn   "P:\Users\XML\XMLEngine\MaxData\Output";

 

%Let Infile = g5_items_sample;


Proc PrintTo Log  ="C:\XML\XMLROW3\XMLToSAS3.LOG" New;Run;

 

 data xmlrow3;
  set dsnin.xmlrow3(drop=xmlrowx);
    Length RowId  4.;
  rowid+1;
  run; 

   Proc Sql NoPrint;
    Select max(RowId) InTo: nobsxml
    From xmlrow3;
   Quit;


/**
  Loop through the entire file, outputting/importing/processing single record at a time
**/

%Macro ReadXML;
 %Do i=1 %To &nobsxml.;
    Data admin;
     file "C:\XML\XMLROW3\xmlrowx3.xml";
    Set xmlrow3;
    where rowid=&i.;
      Put xmlrow;
    Run;

/* Create the XMLMap file in temporary area and use this map to read data*/

   FileName mapfile  "C:\XML\XMLROW3\xmlrowx3.map";run;
   FileName testfile "C:\XML\XMLROW3\xmlrowx3.xml" ;run;

   Libname testfile xmlv2 xmlmap=mapfile auTomap=replace;run;
 

  /**
    Create macro variables for each table created by XML record
  **/
   Proc Sql NoPrint;
   Select
    memname, memname, memname, cats(memname,"x"), count(*)
    InTo
        :cp_list separated By ' ',
 :oDataSet1-,
 :ODslist separated By ' ' ,
 :ODslistx separated By ' ',
 :numDataSet
    From Dictionary.tables
    Where Libname='TESTFILE'
      ;
  Quit;


  /** Copy xml tables from testfile to work lib **/

  Proc DataSets Lib=work NoList;
   Copy In=testfile Out=work;
   Select &cp_list;
  Run;
  Quit;


/** Create macro variables for columns in each xml table **/

 %Do n=1 %To &numDataSet.;

   Proc Sql NoPrint;
      Select  trim(name), count(name)
    InTo :var1 - :var&SysMaxLong., :nvar
    From Dictionary.Columns
    Where
       /**reverse(substr(reverse(strip(name)),1, 7)) ne "ORDINAL" And **/
    memname =  "&&oDataSet&n.";

    Select count(*) InTo :z From &&oDataSet&n.;
  Quit;

  %let z=&z.;


  Data &&oDataSet&n. /*(drop=&dropcol.)*/;
   Set &&oDataSet&n.;
   rowid=&i.;
  Run;
  Proc Sort;By rowid;Run;

/** if there are multiple records in the dataset, collapse these records into a single record **/
   %If &z. gt 1 %Then
    %Do;
      Data &&oDataSet&n..x;
       Set &&oDataSet&n.;
 By rowid;

  %Do  m=1 %To &nvar.;
   Length &&var&m.._1-&&var&m.._&z. $100.;
   Array &&var&m..x $100. &&var&m.._1-&&var&m.._&z.;
      Retain &&var&m.._1-&&var&m.._&Z.  ;
     %End;
    If first.rowid Then
     Do;
      flg=0;
      Do i=1 To &z.;
       %Do c=1 %To &nvar.;
         &&var&c..x(i)="";
       %End;
     End;
   End;

   flg+1;
   If flg le &z. Then
    Do;
  %Do c=1 %To &nvar.;
    &&var&c..x(flg) =  &&var&c.. ;
     %End;
    End;
  %Do c=1 %To &nvar.;
    drop &&var&c..  ;
     %End;
     If Last.rowid Then Output;
   Run;
 %End;
 %else
   %Do;
    Data &&oDataSet&n..x;
     Set &&oDataSet&n.;
    Run;
  %End;
%End;


/** Merge results with the original data record **/

Data adminx;
  merge admin
        &oDslistx.
  ;By rowid;
Run;


/**Clear out SAS Temp **/
 proc datasets lib=work memtype=data nolist;
  delete &oDslistx. &ODslist.;
 quit;

 

/***/
 %If &i eq 1 %Then
  %Do;
    Data nadmin;
     Set adminx;
    Run;
  %End;
 %Else %If &i gt 1 %Then
  %Do;

    proc sql NoPrint;
      create table nadmin as
       select *
       from nadmin
        OUTER UNION CORR
       select *
       from adminx;
    quit;

   
  %End;
 /***/
%End;
%Mend ReadXML;
%ReadXML;


/** create permanent dataset **/
data dsn.xmlrow3;
 set nadmin;
run;

 

2 REPLIES 2
ChanceTGardener
SAS Employee

Have you tried the XML Mapper tool? It's free to download. I've used it half a dozen times or so with success. The XML Mapper creates .map file(s) and you can call those inside of a SAS program editor using code. If the XML files have a consistent naming convention, you can loop through and convert to SAS data sets pretty quickly. 

SK4
Fluorite | Level 6 SK4
Fluorite | Level 6

Hi Thank you for your response.

Yes I actually include the code to generate the map file for each record and use the map file to read in the xml.

 

 FileName mapfile  "C:\XML\XMLROW3\xmlrowx3.map";run;
   FileName testfile "C:\XML\XMLROW3\xmlrowx3.xml" ;run;

   Libname testfile xmlv2 xmlmap=mapfile auTomap=replace;run;

 

The processing time increases during the Append step as the file grows for each iteration. If I can place that part in the Parallel processing alone might help.

 

Thanks

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 727 views
  • 1 like
  • 2 in conversation