SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Quickest load method for XML

Reply
Occasional Contributor
Posts: 7

Quickest load method for XML

Hi All,

 

I am currently trying to load around 20 SAS work tables from a single XML file into SAS D.I studio for further validation and transforming.

 

I have used SAS XML mapper on a local Schema we use and am now currently loading the file through a SAS data step.

 

Filename XML_Input "&Filename.";
Filename XML_Map "&MappingFile";


libname XML_Input xml xmlmap=XML_Map;

data work.TestingFileA;
  set XML_In.TestTableA;
run;

 

The problem I have is that this process is taking around 20-25 seconds per file. There is only a single variable within the data but there are 45 columns which I imagine is where the bulk of the processing is getting caught. Unfortunately all 45 columns are later required for downstream processing.

 

Ideally I would like to re-shape the data (via the XML map?) to be long and thin rather than wide and short but I'm not quite sure how I go about this. Any help would be really appreciated - either improvements to the code itself or how to re-structure the XML MAP.

 

 

Super User
Posts: 5,257

Re: Quickest load method for XML

I haven't worked with XML and DI Studio. But whenever I'm fronted with XML file options, I always ask for another file format. XML wasn't created for DW data integration in mind, rather for short messages between real time applications.

Data never sleeps
Super User
Posts: 5,257

Re: Quickest load method for XML

i moved the post, since the question involves DI Studio, so the Data Mangement forum is probably a better place.

Data never sleeps
Occasional Contributor
Posts: 7

Re: Quickest load method for XML

Thanks alot - Unfortunately we don't have any say in the input method in this instance so XML will have to do but will note your advice for future reference!

Super User
Posts: 10,500

Re: Quickest load method for XML

I understand that you may have procedures but sometimes it is appropriate to question those. It sounds like the data was already in SAS since you say SAS work tables. Why not use one of the SAS file transport procedures?

 

If this is the case of "some one else needs the XML version" the computers don't care if you tell to provide the output in multiple forms. If the files are "very large" you may have issues with storage but the XML will be the culprit not SAS transport files.

Another advantage of something like CPORT is that all of the data sets are in a single file and the extract will recreate all of the files. So if the process later adds more files you really don't need to make much change to the transport part of the code by you would have to add additional steps for each XML.

Occasional Contributor
Posts: 7

Re: Quickest load method for XML

Unfortunately the data arrives through an external XML file and it is my job to then create a number of datasets in SAS.

 

After some investigation I've found that using a data step to create a table which contains a single observation (from the file header say) takes the same length of time to create a table with ~8000 observations.


So instead of my original hypothesis that the data processing time was so high because of the number of variables I now believe the process is taking a while because no matter even if I'm pulling out a single table with 5 columns and 1 row, or 50 columns with 8000 rows the processing time will always be the same.

 

My next train of thought is if there is a way to use a single data step statement (or an alternative way to read in an entire XML file the once) and set all the tables from the XML map I need in one go, rather than having to scan the entire contents of the file either time...

Super User
Posts: 10,500

Re: Quickest load method for XML

If I understand the size of the files is likely not the issue. If the file is located on a network resource it may be a systems communication issue slowing things down such as permissions or the priority the job runs under, or just plain bandwidth bottlenecks. If you have multiple servers involved, one for the file, one for a SAS server and another for execution or similar then any bandwidth or communication problem is likely magnified.

 

I don't use the XML mapper very often so I'm not sure if that doesn't create an additional performance penalty.

Occasional Contributor
Posts: 7

Re: Quickest load method for XML

The only performance penalty I can speak of is with regards to the fact the datastep appears to have to scan the entire XML file for each datastep even if the resulting table I wish to create is only a single row in length.

 

Again for example a table with a single row will take the same amount of time as a table with thousands of rows. I was just curious if there was a way to mitigate this waste in processing - to load the XML file into SAS in it's entirety once and then set the various tables as they are specified in the XML map.

PROC Star
Posts: 1,093

Re: Quickest load method for XML

My feeling is very similar to Reeza's...this is an inherent behaviour of XML, which to me is better suited to small volumes of complex data rather than statistical data volumes. You might be able to speed things up, but at the cost of putting yourself behind the eight-ball when the data structure changes (which it always does). If I'm reading your posts right, you're only losing about ten minutes for all of this irritation; would you be best off just putting up with it?

 

Let us know what you figure out,

  Tom

SAS Employee
Posts: 31

Re: Quickest load method for XML

I see you are using the XML LIBNAME engine. If you are running on SAS 9.2 or later, you could be using the XMLV2 LIBNAME engine. Being the most recent version of the XML engine, it may have improvements that could speed up the loading of the data.

 

Ask a Question
Discussion stats
  • 9 replies
  • 499 views
  • 1 like
  • 5 in conversation