02-11-2016 06:29 AM
I am currently trying to load around 20 SAS work tables from a single XML file into SAS D.I studio for further validation and transforming.
I have used SAS XML mapper on a local Schema we use and am now currently loading the file through a SAS data step.
Filename XML_Input "&Filename.";
Filename XML_Map "&MappingFile";
libname XML_Input xml xmlmap=XML_Map;
The problem I have is that this process is taking around 20-25 seconds per file. There is only a single variable within the data but there are 45 columns which I imagine is where the bulk of the processing is getting caught. Unfortunately all 45 columns are later required for downstream processing.
Ideally I would like to re-shape the data (via the XML map?) to be long and thin rather than wide and short but I'm not quite sure how I go about this. Any help would be really appreciated - either improvements to the code itself or how to re-structure the XML MAP.
02-11-2016 06:45 AM
I haven't worked with XML and DI Studio. But whenever I'm fronted with XML file options, I always ask for another file format. XML wasn't created for DW data integration in mind, rather for short messages between real time applications.
02-11-2016 08:55 AM
I understand that you may have procedures but sometimes it is appropriate to question those. It sounds like the data was already in SAS since you say SAS work tables. Why not use one of the SAS file transport procedures?
If this is the case of "some one else needs the XML version" the computers don't care if you tell to provide the output in multiple forms. If the files are "very large" you may have issues with storage but the XML will be the culprit not SAS transport files.
Another advantage of something like CPORT is that all of the data sets are in a single file and the extract will recreate all of the files. So if the process later adds more files you really don't need to make much change to the transport part of the code by you would have to add additional steps for each XML.
02-11-2016 09:06 AM
Unfortunately the data arrives through an external XML file and it is my job to then create a number of datasets in SAS.
After some investigation I've found that using a data step to create a table which contains a single observation (from the file header say) takes the same length of time to create a table with ~8000 observations.
So instead of my original hypothesis that the data processing time was so high because of the number of variables I now believe the process is taking a while because no matter even if I'm pulling out a single table with 5 columns and 1 row, or 50 columns with 8000 rows the processing time will always be the same.
My next train of thought is if there is a way to use a single data step statement (or an alternative way to read in an entire XML file the once) and set all the tables from the XML map I need in one go, rather than having to scan the entire contents of the file either time...
02-11-2016 11:08 AM
If I understand the size of the files is likely not the issue. If the file is located on a network resource it may be a systems communication issue slowing things down such as permissions or the priority the job runs under, or just plain bandwidth bottlenecks. If you have multiple servers involved, one for the file, one for a SAS server and another for execution or similar then any bandwidth or communication problem is likely magnified.
I don't use the XML mapper very often so I'm not sure if that doesn't create an additional performance penalty.
02-12-2016 02:23 AM
The only performance penalty I can speak of is with regards to the fact the datastep appears to have to scan the entire XML file for each datastep even if the resulting table I wish to create is only a single row in length.
Again for example a table with a single row will take the same amount of time as a table with thousands of rows. I was just curious if there was a way to mitigate this waste in processing - to load the XML file into SAS in it's entirety once and then set the various tables as they are specified in the XML map.
02-17-2016 07:40 PM
My feeling is very similar to Reeza's...this is an inherent behaviour of XML, which to me is better suited to small volumes of complex data rather than statistical data volumes. You might be able to speed things up, but at the cost of putting yourself behind the eight-ball when the data structure changes (which it always does). If I'm reading your posts right, you're only losing about ten minutes for all of this irritation; would you be best off just putting up with it?
Let us know what you figure out,
04-23-2018 10:24 AM
We can create a XML mapper to get only one output table instead of 20 small tables and joining them to get data.
There is retain functionality that can be used to achieve the one table output result.
But I am not sure if it is really going to save any time.
XML in data analysis is very bad kind of data.
02-17-2016 04:59 PM
I see you are using the XML LIBNAME engine. If you are running on SAS 9.2 or later, you could be using the XMLV2 LIBNAME engine. Being the most recent version of the XML engine, it may have improvements that could speed up the loading of the data.