Hi All,
I am currently trying to load around 20 SAS work tables from a single XML file into SAS D.I studio for further validation and transforming.
I have used SAS XML mapper on a local Schema we use and am now currently loading the file through a SAS data step.
Filename XML_Input "&Filename.";
Filename XML_Map "&MappingFile";
libname XML_Input xml xmlmap=XML_Map;
data work.TestingFileA;
set XML_In.TestTableA;
run;
The problem I have is that this process is taking around 20-25 seconds per file. There is only a single variable within the data but there are 45 columns which I imagine is where the bulk of the processing is getting caught. Unfortunately all 45 columns are later required for downstream processing.
Ideally I would like to re-shape the data (via the XML map?) to be long and thin rather than wide and short but I'm not quite sure how I go about this. Any help would be really appreciated - either improvements to the code itself or how to re-structure the XML MAP.
I haven't worked with XML and DI Studio. But whenever I'm fronted with XML file options, I always ask for another file format. XML wasn't created for DW data integration in mind, rather for short messages between real time applications.
i moved the post, since the question involves DI Studio, so the Data Mangement forum is probably a better place.
Thanks alot - Unfortunately we don't have any say in the input method in this instance so XML will have to do but will note your advice for future reference!
I understand that you may have procedures but sometimes it is appropriate to question those. It sounds like the data was already in SAS since you say SAS work tables. Why not use one of the SAS file transport procedures?
If this is the case of "some one else needs the XML version" the computers don't care if you tell to provide the output in multiple forms. If the files are "very large" you may have issues with storage but the XML will be the culprit not SAS transport files.
Another advantage of something like CPORT is that all of the data sets are in a single file and the extract will recreate all of the files. So if the process later adds more files you really don't need to make much change to the transport part of the code by you would have to add additional steps for each XML.
Unfortunately the data arrives through an external XML file and it is my job to then create a number of datasets in SAS.
After some investigation I've found that using a data step to create a table which contains a single observation (from the file header say) takes the same length of time to create a table with ~8000 observations.
So instead of my original hypothesis that the data processing time was so high because of the number of variables I now believe the process is taking a while because no matter even if I'm pulling out a single table with 5 columns and 1 row, or 50 columns with 8000 rows the processing time will always be the same.
My next train of thought is if there is a way to use a single data step statement (or an alternative way to read in an entire XML file the once) and set all the tables from the XML map I need in one go, rather than having to scan the entire contents of the file either time...
If I understand the size of the files is likely not the issue. If the file is located on a network resource it may be a systems communication issue slowing things down such as permissions or the priority the job runs under, or just plain bandwidth bottlenecks. If you have multiple servers involved, one for the file, one for a SAS server and another for execution or similar then any bandwidth or communication problem is likely magnified.
I don't use the XML mapper very often so I'm not sure if that doesn't create an additional performance penalty.
The only performance penalty I can speak of is with regards to the fact the datastep appears to have to scan the entire XML file for each datastep even if the resulting table I wish to create is only a single row in length.
Again for example a table with a single row will take the same amount of time as a table with thousands of rows. I was just curious if there was a way to mitigate this waste in processing - to load the XML file into SAS in it's entirety once and then set the various tables as they are specified in the XML map.
My feeling is very similar to Reeza's...this is an inherent behaviour of XML, which to me is better suited to small volumes of complex data rather than statistical data volumes. You might be able to speed things up, but at the cost of putting yourself behind the eight-ball when the data structure changes (which it always does). If I'm reading your posts right, you're only losing about ten minutes for all of this irritation; would you be best off just putting up with it?
Let us know what you figure out,
Tom
We can create a XML mapper to get only one output table instead of 20 small tables and joining them to get data.
There is retain functionality that can be used to achieve the one table output result.
But I am not sure if it is really going to save any time.
XML in data analysis is very bad kind of data.
I see you are using the XML LIBNAME engine. If you are running on SAS 9.2 or later, you could be using the XMLV2 LIBNAME engine. Being the most recent version of the XML engine, it may have improvements that could speed up the loading of the data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.