10-25-2013 01:56 AM
I"m trying to convert a 5 GB XML file to SAS dataset. When I initially trying to use XML Mapper to parse the data, the program report the file is too big to process.
So I found the following code online to do the conversion:
libname myxml xml 'E:\to\path\sample.xml';
libname dat 'E:\destination\path\';
proc copy in=myxml out=dat;
However, the SAS file generated contains a lot of empty columns and some columns with values. it could be the case that conversion is done correctly. But I saw the following in the log:
NOTE: BUFSIZE is not cloned when copying across different engines.
System Option for BUFSIZE was used.
For some of the file, it reports:
ERROR: This DBMS table or view cannot be accessed by the SAS System because it contains column
names that are not unique when a SAS normalized (uppercased) compare is performed. See
"Naming Conventions" in the SAS/ACCESS documentation.
May I ask whether my original no-brainer conversion is correct?
11-08-2013 12:47 PM
The conversion process you used is correct in the sense that there are situations in which it will work, but it transfers the data in a very simplistic way, and it is unlikely that using it for a large complex XML file will provide the results you want.
There are several options you can use to import the data properly:
If you have SAS 9.3 or later, in many cases, you can use the "AUTOMAP" feature of the XML Libname engine to create a map automatically. It is more memory-efficient than using the interactive XML Mapper, but it is still likely to run out of memory on a 5GB file. Here's the doc: SAS(R) 9.4 XML LIBNAME Engine: User's Guide http://support.sas.com/documentation/cdl/en/engxml/65362/HTML/default/viewer.htm#n0km8qcf0hv1kon1o7b...
If you have reduced the size of the XML, but are still running out of memory, or are seeing poor performance, you can increase the amount of memory available to XML Mapper by setting an option on the Java VM. There are instructions for doing so in this SAS Note: 39804 - Opening large XML files using the XML Mapper might generate errors - I would recommend "-Xmx1024m" for most machines these days.
There is more good info in the online help.
04-30-2015 04:54 PM
If useful, I tried a working around solution for a a few xml files, one being 2.5GB big. I had the same issue as the size of these files I was trying to import (via proc import or libname functions) was too big for SAS to deal with it. MS Excel couldn't either do the job of getting these xml files into proper datatables...
As I needed desperately to convert them, I tried Access which allowed me to do so extremely quick and easy. I only had then to import the whole access database (made of the xml files needed) into SAS via a proc import with dbms=ACCESSCS (SAS PC to server files installed), repeated for each file.
For sure, it's not a 100% SAS solution but job done for a 2.5GB file. It could also do it for 5GB ?
Hope it will help.