Hi Experts,
I am having little trouble tuning a import process from XML to SAS.
I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.
For all XML tags we create tables and then sql join them using system generated ordinal columns to get the final table.
I am using proc copy to import all the XML tags to SAS tables.
The problem is in PROC COPY, as it is slow. In the log I can see following.
INFO: Data set block I/O cannot be used because:
INFO: - The data sets use different engines, have different variables or have attributes that may differ.
So Basically there is no use in increasing buffer size or bufno.
Could you please suggest any other practice that can boost the performance.
Note: I tried COMPRESS=NO that did not help.
Regards
Satish
O.K. So what you're doing is basically:
1. You've got two libnames, one for your source and one for your target library.
2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)
3. You use Proc Copy to copy your data from source to target.
What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.
Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code.
1. Create a list of all XML files in the folder
2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)
3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()
4. Run the child processes in parallel (1 per XML to be converted to a SAS table)
What do you mean by copying XML? XML are text files really so a copy can be really easy if all you want is another physical copy, via FCOPY() function.
Or do you mean import an XML file instead, ie read/parse an XML file to a SAS data set?
@Satish_Parida wrote:
Hi Experts,
I am having little trouble tuning a copy process from XML to SAS.
I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.
However in the log I can see following.
INFO: Data set block I/O cannot be used because: INFO: - The data sets use different engines, have different variables or have attributes that may differ.
So Basically there is no use in increasing buffer size or bufno.
Could you please suggest any other practice that can boost the performance.
Note: I tried COMPRESS=NO that did not help.
Regards
Satish
I mean importing XML data using a mapper.(Question Edited)
I am using PROC COPY now, which should be efficient for this work, but as BLOCK IO is not in work it is slow.
We probably need a lot more detail about the file to provide any specific advice.
In general an XML file is not an efficient storage format, since it is normally very verbose. If you want speed do not use XML.
You might be able to find a non-SAS utility that could convert the XML into a more usable format, like a simple CSV file, that SAS can read more efficiently.
O.K. So what you're doing is basically:
1. You've got two libnames, one for your source and one for your target library.
2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)
3. You use Proc Copy to copy your data from source to target.
What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.
Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code.
1. Create a list of all XML files in the folder
2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)
3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()
4. Run the child processes in parallel (1 per XML to be converted to a SAS table)
Servers are often not used to capacity but if that's a problem for you then limit the threads to something acceptable.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.